Episode 50: DataStax NoSQL solutions built on Apache Cassandra with Kathryn Erickson, Open Source and Ecosystem Strategy


Mike Schwartz: Hello and welcome to Open Source Underdogs. I’m your host, Mike Schwartz, and this is episode 50 with Kathryn Erickson who helps lead open-source strategy at DataStax. Founded in 2010 and currently employing about 500 people, DataStax was one of the first and most successful companies in the Apache Cassandra big data Ecosystem.

Kathryn has an engineering background. You can listen to some of her great deep dives into the tech on the DataStax website. In her role on the strategy team, she’s helping to lead the company into its next phase of growth and community engagement. I hope you’ll enjoy this episode. And if you do, don’t forget to share a link on social media. You can find all the episodes on, or you can retweet our announcement by following us on Twitter. Our handle is @fosspodcast. So, without further ado, let’s carry on with the interview.

DataStax Origin

Mike Schwartz: Kathryn, thank you for joining us today.

Kathryn Erickson: Sure, of course, thank you.

Mike Schwartz: Most of our listeners probably know about Apache Cassandra, one of the most popular databases for big data, but how did DataStax evolved in relation to the Cassandra project.

Kathryn Erickson: DataStax was founded by Jonathan Ellis and Matt Pfeil, both employees of Rackspace. Jonathan, being contributor to Apache Cassandra and Project Share as well, was considering leaving Rackspace, and Matt Pfeil went to talk to him and say, “Hey, there’s some really cool stuff going on here, you should really consider staying.” And by the end of the conversation, they were founding a company together.

And so DataStax was founded to support Apache Cassandra. Over time, we began adding Enterprise features and selling an Enterprise distribution of the database with these features added, and then, of course, more recently, the cloud platform as a service offering as well.

Evolution Of Support Offering

Mike Schwartz: Actually, I didn’t realize that you started out providing support. Because when I first ran into DataStax, I guess I had just known it as a distribution of Cassandra. And now, I see that you’re also providing support for the open-source distribution. Can you talk a little bit about how that’s evolved over time? Has it always been there or has there been a focus on for or against doing that?

Kathryn Erickson: It hasn’t always been there. When DataStax was founded 10 years ago, there wasn’t really a playbook for how to build and run a successful open-source company.
We were founded around the premise of providing support and consulting for Apache Cassandra. Over time, we did, all for the Enterprise Edition, but what you see with most Enterprises is that they have a mix of the Enterprise version and open source. For some customers, that’s dependent on the criticality of the data, and for other customers, it’s dependent on the features or the distribution, being the as-a-service offering or self-installed on-prem.

And so, what we saw in the last year was that there were some obvious things that we weren’t doing, and our customers needed support and consulting around open-source Cassandra. We are beginning to open-source a lot more of the features that would build Cassandra abundance, and so, it made sense to bring those offerings back.

Astra – DataStax Cloud Offering

Mike Schwartz: Okay, and you mentioned that DataStax launched a new hosted service called Astra. Do you see that product as a driver for revenue, or is it just an easier path for customers to test drive the product?

Kathryn Erickson: I think that will evolve over time. I think at launch, it is the easiest way to learn Apache Cassandra. And I think as we launched the hybrid option, I believe that’s later this year, that would become a more significant line of revenue.


Mike Schwartz: Most of the revenue today I guess is from the license Enterprise product, so focusing on that, a lot of open-source businesses are moving towards consumption-based pricing. And I’m wondering, what kind of metrics do you use to determine what is consumption?

Kathryn Erickson: You know, a cloud-based offering consumption is based on capacity. And with our licensed product and with Luna, the open-source support offering, our focus this year has been around simplification of the pricing model. And we revisit that each year.

With the Enterprise product, we previously charged for the Enterprise license, and then, an optional additional fee for advanced workloads, like Spark analytics and graph. That’s confusing for the customer, they just want a simple pricing mechanism. So, we collapse that pricing. And then, of course, for larger deals ,we would have ELAs, or special terms to accommodate those customers.

Mike Schwartz: That consumption is based on, like, per CPU, per server, or how do you actually figure out what is the size?

Kathryn Erickson: It’s true capacity-based, the size of the data set being stored. And as we move to Astra hybrid, which will be that offering on-prem, I think we’ll consider that pricing option there as well.

Market Segmentation

Mike Schwartz: Data persistence is like the most horizontal market on the planet. Every company basically needs to store data. When you can sell to everyone, it’s sort of a blessing and a curse. Do you segment the market at all vertically or by use case, or do you just not segment the market?

Kathryn Erickson: It’s hard to segment when you’re serving a pretty broad market. What we try to do is have as easy of an on-ramp for the different verticals as possible. We see data models look similar between IoT use cases, inventory and messaging data models would be similar.
So, we don’t segment the market for go-to-market strategies, but we try to find places of repeatable consulting efforts to speed up the successes for those customers.


Mike Schwartz: When you took on the role of director of strategic Pprtnerships, you probably did a survey of the range of partnerships that exist. Can you talk about like what is the partner landscape look like at DataStax?

Kathryn Erickson: I ran our technology partner program, and there’s two other sides of that, SI partners and the cloud partners. On the technology side, you want to make it easy as possible for customers to consume your product.

So, in a technology partner program, you want to understand the user journey to get to your product, and make sure that those adjacent technologies have the simplest most repeatable easy to build, easy to test integrations as possible over time. If you want to think about specific companies and integrations, every database needs an ODBC and JDBC connector. And customers want those for BI, for reporting, for simple ways to move data in and out of the system, but in the last few years, most customers also want to see Kafka connectors and more high-speed ingest Pub/Sub integrations.  So, we want to accommodate those as well.

Mike Schwartz: Coming on the System Integrator side, you know, at Gluu, we found that those have been essential for us, to be able to focus on innovating the product versus getting involved in specific projects. But there’s such a broad range when you’re serving a global market of the System Integrators. Do you consider them channel partners or integration partners?

Kathryn Erickson: We usually consider them strategic partners when we take those types of partnerships on. And the goal is usually to help us penetrate markets that we don’t currently have field team in, or packaged, or cookie-cutter solutions. If you look at some of the stuff that we’ve done with VMware and with partnerships at Dell, we want to assert that the product stack works as recommended for customers that are used to seeing these reference architectures from these larger integrators and technology companies.

Most Important Partnerships For Driving Revenues

Mike Schwartz:  Which partnerships, do you think are the most important for actually driving growth?

Kathryn Erickson:  Deloitte’s been in a role to our federal business, they know that space better than any startup could hope. VMware for helping to modernize Enterprise platforms. Enterprises that are looking at Cassandra and looking at DataStax are usually going through some type of digital transformation. And the product that they already have in place is VMware. So, everything that we could do to make that migration to know SQL smooth was helpful to those customers. VMware has been a pretty big partner in my journey.

Open Source Strategy

Mike Schwartz: Some of the companies we’ve interviewed are moving to a 100% open-source strategy, specifically Chef and Cloudera. In the past, the value property DataStax, it had improved distribution of Cassandra.But do you see DataStax maybe moving more in the direction of open-sourcing its platforms and some of that technology it’s developed?

Kathryn Erickson: We are open-sourcing a lot more. We try to stick to simple rules for open sourcing, simple rule is, it’s a Harvard Business review article, simple rules for a complex world.
And so, simple rules for open source, if it increases adoption Cassandra, it should be open-sourced. And if it’s Enterprise feature that’s more specific to Enterprise customers, like security features or advanced replication options, then that would be kept proprietary.

And then, where should something be open-sourced? Well, if it makes a change to the core of Cassandra, of course it should go to the Apache project. And if it increases abundance, but it’s not impactful to the core of the project, then it still should be open-sourced, but maybe able to exist in a DataStax repo or different foundation.

Does Open Source Help?

Mike Schwartz: Do you think the wider open-source community A Cassandra helps DataStax too?

Kathryn Erickson: Of course, open source is all about positive sum games. I think it was Thomas Jefferson that said, “If use my light to light your torch, then we both have light.” And that’s how open-source works. The more communities and more companies that you can move from being other to being self, the larger the positive sum game that you’re playing. So, it’s open source, and open-source abundance is absolutely essential to the success of any open-source company.

Thoughts About Open Source Foundations?

Mike Schwartz: Any thoughts about Cassandra being hosted at the Apache Foundation versus perhaps Linux Foundation or the CMSF?

Kathryn Erickson:  I don’t have any opinions on the other foundations, but I think that Apache Cassandra will always be at home with the ASF. They have their simple rules for what it means to protect the open-source nature of a project, and they don’t waiver. And for a vendor backing an open-source project, that can be like a Northern Light, you can lose your way, and you can always look back up and reorient towards the community.

But you know, there’s nice things when you see CNCF, you know, the marketing wing, and the power of the CloudNative messaging that’s there. But there’s no reason that projects can’t have pieces that exist in different foundations either.

We see ourselves and others that build communities operators or management APIs or drivers is an example, they should live in a project, but management tooling that exists that the maintainers of the project wouldn’t want entry. So, something like that maybe should live in a CNCF type of foundation that’s focused on CloudNative. But no Apache Cassandra will remain Apache, and that’s a tome.

Industry Changes In The Last 10 Years

Mike Schwartz: So, DataStax is one of more mature, well-established companies in the open-source ecosystem today. What are some of the challenges you think that you are looking at now that were different than when you got started?

Kathryn Erickson: When I started a DataStax, it didn’t always feel like we had a lot of competition. And I think as other good distributed databases emerged, we adjusted to having competition. I think the obvious answer that most people would expect is pressure from the public Cloud vendors. But if you stay oriented on the positive sum nature of open source, then that becomes easy to embrace as well.

So, there’s changes in understanding the virtuous cycles of open-source, understanding how to build software as-a-service more quickly as Kubernetes has matured that’s become a lot easier. So, I think the ecosystem around us has matured a lot, the playbooks around how to build a company around open source have matured. And there are more senior projects that kind of exist in our ecosystem that we can work with and learn from as well.

Is Open Source Table Stakes For Databases?

Mike Schwartz: You know, most of the databases that have been released in the last, let’s say five to eight years or so, have been open source. Is being open source basically like table stakes now? So, is it a non-differentiator in the database market?

Kathryn Erickson: I think that if you’re moving from a proprietary relational system, and moving towards NoSQL, then you’re obviously moving into an open-source world. And if you can choose something that has a security life, security blanket that you know will outlive any vendor behind it, then you should consider those options first.

I think that it would be hard to start proprietary databases without the support of the community and of these foundations. I think Snowflake has done an exceptional job and is kind of the exception to the open-source game. But, you know, they were disruptive in a much different way. NoSQL in general is an open-source family.

Data Platform Trends

Mike Schwartz: Just a general database question about the database market. So, we’ve interviewed a probably more database companies on this podcast than any other type of company, but have you ever seen a real shift in the way that customers think about databases.

In the old days, I think you just used to get one database and hope it did everything, but have you seen a sort of on the technology side a shift in the way that companies are thinking about data and databases now, with more SaaS hosted offerings and more database offerings, like in general.

Kathryn Erickson: Yes. I think I think this is definitely the age of data platforms. With Cassandra, we see customers considering NoSQL when they’re using the relational system. And it can’t support the throughput that they need anymore, or they need to replicate more geographies, or exist in a multi-cloud or hybrid environment.

And so, that’s when you consider Cassandra. If you look at when you might consider Mongo, you want to get quick start with a developer friendly environment that’s great for mobile. What you start to see is that there’s a certain fit for purpose that the different NoSQL databases have. We’ve started to see an emergence of multi-model systems that move forward. And consolidating those capabilities, we have that with our Enterprise products and their integrations for graph analytics and search, we want to help customers build high-growth applications, high-speed transactional applications are the sweet spot of any Cassandra deployment.

Advice For Startup

Mike Schwartz: This is a question, a sort of a generic question for entrepreneurs who want to launch a business around an open-source product. I’m wondering if you have any advice, for let’s say, startups? And it could be general and it could be about partnerships.

Kathryn Erickson: You don’t have to invent a path to success, you can listen to the A16 podcast, you can look at other companies that are out there. You can go through so many success stories on podcasts like this, you can listen to Cockroach, and there are Open Source Underdogs podcast talk about how they’re thinking about licensing other companies. You know, having similar conversations, really understand what has made other companies successful, and don’t try to invent that yourself.

How To Improve Tech Diversity?

Mike Schwartz: Last question. As you’ve might noticed, there aren’t enough women in the tech business, including there haven’t been enough women on my podcast, so thank you for joining. What can we do to reverse that trend?

Kathryn Erickson: I think there’s a lot that we can do. as You are on the side of making mistakes, just try things, and if it’s not the right thing or if it doesn’t work, try something else. We’re going to do a program at DataStax, you know, Jumpstart, if you’re a woman or a person of color, and you want to learn Cassandra, and you don’t know where to start, just hit the button, sign up. Somebody from the team will meet with you for 30 minutes and help you get started. That might work, that might fall flat, but we’re going to just start trying stuff. And I think everyone should just start trying the ideas that they have, and we should all tell each other what’s working.

How’D You Get Started?

Mike Schwartz: How did you get started in the tech industry?

Kathryn Erickson: Well, my dad taught Computer Science, Community College, and I was going to be a DNA researcher. And I just wasn’t very good at it, and I thought, “You know what dad’s over Computer Science, we’ve been playing with computers all of our lives.” That sounds more like playing then working, it’s been that way ever since. It feels more like playing than working every day,

Mike Schwartz: That’s great. Thank you so much for joining us today, Kathryn, and sharing your insights. And best of luck at DataStax.

Kathryn Erickson: Sure. Thank you.


Mike Schwartz: Thanks to the DataStax PR team for helping us to schedule some time with Kathryn.

Editing by Ines Cetenji. Transcription by Marina Andjelkovic. Cool graphics by Kamal Bhattacharjee. Music from Broke For Free, Chris Zabriskie and Lee Rosevere.

Next episode we’re excited to have Cornelia Davis, author of Cloud Native Patterns, a Manning book that needs to be on every software architect’s bookshelf. She’s also the CTO of Weaveworks. She was fantastic, so don’t miss it. Until next time, thanks for listening, and stay safe.