Cassandra

Episode 54: Justin Borgman, CEO of Starburst, the Company Behind the Presto Project

Intro

Mike: Hello, and welcome to Open Source Underdogs. I’m your host Mike Schwartz, and this is the episode 54, with Justin Borgman, Chairman, CEO, and Co-Founder of Starburst, the company behind the Presto Data Access Project.

Before we get started, I have a quick request – we all want to help open-source founders and startups. I make the podcast, but I need your help to get the word out, so tell your friends, post on LinkedIn, tweet out a link, post on Hacker News, or follow me and share one of my posts on LinkedIn, whatever you think makes sense, go for it.

One of the themes of Machiavelli’s the Prince is Virtu e Fortunavirtu meaning excellence in your domain, and fortuna meaning luck, whether good or bad. I really like how the story of Starburst exemplifies this 500-year-old insight.


Justin has a ton of domain virtu. He has deep technical knowledge, but he’s also on the lookout to harness fortuna. He’s one of the few podcast guests to acknowledge it. And Starburst earns its name because it’s one of the most stellar open-source business success stories I’ve heard in the last few years.
There’s so many great insights in this episode, a lot to think about. So, without further ado, let’s get on with the interview.

What Is Presto?

Mike: Justin, thanks for joining the podcast today.

Justin: Hey, Mike, super glad to be with you.

Mike: Before we dive into the business stuff, I find it’s helpful to talk a little bit about the technology. Can you start by giving a brief history of the Presto project? What it’s good at, and how the community coalesced around it?

Justin: It was really back in 2012 for developers at Facebook, Martin, Dain, David, and Eric came together to create a new infrastructure project that would be a faster way of querying data at Facebook. Facebook, of course, collects massive amounts of data, hundreds of petabytes worth of data , and needed a faster alternative to a prior project that they also developed and they called Hive.

Hive was a SQL engine for Hadoop, and it just wasn’t fast enough. So, Presto was created to be a faster means of accessing that data. But it has one really important differentiation in addition to the speed, which is the ability to access data anywhere. So, it’s like a database without storage – that’s kind of one way to think about it.

So, it looks at storage in other systems, which could be Hadoop, it could be S3 and AWS, it could be a traditional database, like Oracle, or Teradata, or Snowflake. And regardless of where that data lives, Presto can reach it, query it, and deliver SQL-based analytics.

So, that’s kind of what makes it special, is the ability to access the data everywhere. And that’s gained particular momentum, I would say more recently, as many large enterprises have data silo problems, where they have data in a bunch of different databases, and are now perhaps moving to the Cloud in some fashion.

Mike: And if I’m not mistaken, high concurrency is one of the areas that make sort of this data access plain different?

Justin: Yes, exactly, it’s very fast, and can support high concurrency. And in a lot of ways, this technology was sort of, I like to say built in reverse, in the sense that it was tested at ridiculous scale from day one. You know, very often, when you start something new, you don’t really know how it’ll work at scale until you get people using it. But because it was really born out of the internet companies, Facebook, and Uber, Airbnb, and Netflix were all early adopters to use the technology, it was really tested, and at scale, and as a result delivers great performance and concurrency.

Origin Story

Mike: Starburst is not your first company, you are part of a team at the company called Hadapt that’s sold to Teradata in about three and a half years, I think.

Justin: Yep.

Mike: How did that experience lead you to Presto?

Justin: In a lot of ways, this is really a continuation of that journey that began 10 years ago. So, that was 2010 that I started Hadapt. Hadapt was a spin-out actually from Yale University and the computer science department – there’s some research called HadoopDB, which was pretty pioneering research at the time, in terms of thinking about Hadoop as a data warehousing solution, and being able to deliver fast SQL analytics on top of Hadoop.

So, we spun that out, raised Venture Capital, built that business over nearly four years, as you mentioned, and then sold it to Teradata. We had ups and downs, definitely lessons learned through that experience. And I think, really, my discovery of Presto after arriving at Teradata in 2014 was kind of an exciting opportunity to reimagine the strategy that we had with Hadapt.

So, Hadapt was the SQL engine for Hadoop, Presto is a SQL engine for anything essentially, allows you to access data anywhere.it was an opportunity to basically take all the lessons learned from the first experience and start to apply them over again.

It was actually my team from Hadapt that ended up contributing a tremendous amount of software to Presto, and working with the guys at Facebook, who created it to really make it an enterprise-grade piece of technology. And I think, as we started to see Presto get more and more capable, and see more and more people use it, that was what created the idea in our head that maybe there was a business to be formed around this.

Community Engagement

Mike: It’s a really interesting opportunity, and I can’t actually think of another example like it, but when I’m talking about open source, I sometimes talk about three types of open-source companies. One would be volunteer, where a bunch of guys or girls get together and write some piece of software that they love, but not necessarily for a business.

And then, I talk about corporate open source, where there’s some piece of software, where a company funds it, but it’s not their core business, but then, they realize that makes sense for them to collaborate like Kubernetes, let’s say ,and Google, and these pure-play, open-source companies, where the company behind it is developing it, and they’re the main contributors.


And so, lots of great open-source projects come out of this corporate open-source area, the podcast that is mostly focused on pure-play because they were trying to help entrepreneurs and founders start open source, use open source as part of their business model. But you’ve sort of, like, created a very interesting situation, where you have a mix of corporate and pure-play because you’re benefiting from, not just the community, but, really, Facebook is a big contributor to the project to — I heard almost 50/50. So, how’s that really evolved, and how do you continue to encourage this very symbiotic relationship?


Justin: You’re right. Preston has a very interesting history to it, an interesting journey. It started as a small project at Facebook. When we got involved at Teradata, we were able to apply a few million dollars a year of R&D budget into advancing that as well. And then, of course, you’ve got a few other companies contributing also along the way.

And, as a result, all of that kind of accelerates the development of the project. And I think that maybe what’s most unique here is not only that Facebook created great infrastructure software as a byproduct of their business – they’ve certainly done that before – but rather that there was kind of a commercial partner very early on, and myself, and my team at Teradata thinking about the commercial applications of this.

So, you know, back in 2014, Presto was still in its early days, Facebook wasn’t trying to monetize it obviously, that’s not their business, but we were already thinking about how this could be used by Fortune 500 customers, and what difference this could make to their business. And I think that led to its very enterprise-applicable evolution, and set us up really well to eventually commercialize this in 2017, when we left Teradata, the creators of Presto joined us from Facebook. And we went off on our way to build this business.

Idea Incubation

Mike: So, you were working on Presto while you’re at Teradata. And did Teradata ask for any equity, or how did that work when you told Teradata, “We want to start this company basically working at Teradata? Like, what was that like?

Justin: Yeah, well, what was interesting about that – and I guess just to set the context, I think Teradata, from 2014, when they acquired my company through to probably today, has gone through various iterations of kind of rethinking their overall strategy, in terms of how they evolved into this next generation of sort of Big Data platforms. Because they had great success in the ‘80s, ‘90s, and early 2000s, as this kind of monolithic data warehouse, where you would ingest everything and store it in one place.

But obviously that became very expensive over time. And the appliance model, hardware and software combined, wasn’t necessarily set up for this future as people move to the cloud. So, they’ve gone through a lot of iterations. And it was really in that iterative process, where they weren’t really clear where they wanted to go, that they actually felt like Presto is maybe a distraction for them.

So, that actually created the opportunity, I think, for us, to say, well, we think it’s a little more than a distraction. And, you know, we’d be happy to sort of take that off your hands and work on this together.

So, it was a very amicable split – we remain partners, we’re still partners today, where we work together on some customer accounts, the technologies work together, we can access data in Teradata, for example, from Presto. So, that partnership remains. But it was one that I think for them, they viewed us as sort of taking Presto off their hands because there were maybe close to a dozen companies within their customer base that were using Presto. So, we were able to deliver really first-class support to those customers, you know, not provide any interruption there, even as we left and formed this new business. So, they don’t own equity, it’s purely a partnership.

Identifying Opportunity

Mike: It’s just amazing like how you deal your business, is you got a huge company Facebook to help you grade and test this infrastructure. You got to do R&D in Teradata, and then you started the business with customers – it doesn’t get any better than that really.

Justin: Now, you’re absolutely right. And believe me, the good fortune is certainly not lost on me. You know, advice I give to entrepreneurs of any type, not just open-source entrepreneurs, is to just have your eyes open to opportunity. I think it passes us all by all the time, and very often we miss it. And I think seeing it, and then, you know, running and jumping on it, it certainly has been beneficial in my career. I’m even going back to my first company and spinning out technology from Yale, which you could argue was the great benefit of various government research grants, funding that research in the first place. So, keeping the eyes open and seeing an application for where it could become a business.

When To Raise Money

Mike: So, initially, you didn’t have to raise money because you had some customers that came that provided some runway, but you did raise a series A, and I guess, October 2018, so, pretty recently. So, what was in the decision process to say, “Okay, now capital is going to help us.”, like what were some of the benchmarks that you reached, that helped you say, “Now is the time we should do that.”?

Justin: So, that’s exactly right. We started without raising any capital. That allowed us to build a profitable cash-flow positive business over those first two years of operating, which I think, by the way, as an aside, gave us a lot of opportunity to be patient and sort of think through exactly what we wanted our go-to-market strategy to be, what kind of strategy we wanted to take around monetization.

And we didn’t have the pressure of investors necessarily breathing down our neck, which I think many, many entrepreneurs have in those early days. So, I think it was a great way to start a business, what forced us to change and actually consider taking capital was really a realization that the market opportunity was bigger than we felt like we could actually satisfy growing at purely an organic rate.

So, we took that series A really as a growth round, you know, even though it’s called the series A, I think it’s a little bit misleading, because it’s probably more like a series B for most companies in that. Not only was it a large amount of money 22 million in that first round, but it was really deployed towards expansion and rapidly growing the business. Less so about proving product/market fit, which is more typical in a series A.

As you said, we did a series B shortly thereafter, which was probably more like a series C, adding another 42 million. So, we’ve gone from raising nothing to now 64 million. And really I think that was all made possible by really building the fundamentals first. Making sure you have that product/market fit sorted out, and then, you know, applying fuel to the fire to expand.

Revenues Pre-Investment

Mike: What was the revenues when you raised the series A?

Justin: Yeah, well, if it was 12 months looking forward, I would say it was already looking north of $10M at that point. So, that allowed us to really take the funding and apply it to, again, expansion rather than kind of sorting out the basic product details.

Mike: And what year did you actually start the company?

Justin: 2017.

Mike: That’s pretty amazing – two years to go to $10M. It’s pretty stellar.

Justin: Thank you. I mean, again, I think a big advantage here was that, in some ways, this was like building the same company over again – I mean, there are a lot of differences between this and my first, but they’re also enough similarities, just in terms of the types of customers that we sell to, the types of use cases, the types of problems that they’re trying to solve. So, I think that historical knowledge was advantageous for us to just move a little bit faster this time around than we did that the first time.

Balancing R&D Investment

Mike: Okay, switching gears a little bit into more basic business stuff. You mentioned in one of your previous interviews that I listened to, that Starburst is basically pursuing an open-core strategy. So, performance, robustness, security patches that goes into open source, things like connectors, security, ease of use, I guess GUI deployment stuff, goes into the core. One of the questions that I’ve sort of wonder about is, how do you decide how to prioritize R&D in open source versus the enterprise features when you go the open-core route?

Justin: Yeah, I mean, I think that’s the key question. So, it makes sense why you’re asking it, and I think it has to be on the mind of every open-source entrepreneur. And it’s a delicate balance because, on the one hand, you want to make the open-source project as useful as possible to get widespread adoption. Because really that’s your lead generation vehicle – I think that’s the way to think about it.

A lot of people say open source is really just another form of a freemium business model. There’s a free component, and that just happens to be open source in an open-source model. And then, how do you kind of upsell to the Enterprise version. So, for us, I think the logic was, what are the reasons why people use Presto anyways in the first place.

And we think performance is a core element to that. So, we wanted to make sure that performance is always great, right out of the box, with the first experience of it, including the open-source version. So, that’s why a ton of work goes into open-source around performance enhancement, scalability enhancements, those kinds of things.

And then, we think about, well, what do people in enterprises, who are willing to pay for this stuff, what do they want. And that’s where it is, things like security features, which are just essential for any large, mature enterprise things, like role-based access control, data masking, if you’ve got social security numbers or credit card numbers, being able to mask digits appropriately, having audit logs for querying.

And then, because Presto access is all these different types of data sources, it also made logical sense that if you’re going to access a database like Oracle, or Teradata, or IBM, all of which are very expensive in their own right, well then, a customer, probably, is willing to pay for enhanced connectors to get faster throughput to those systems.

So, that was kind of the logic was trying to like think through what are the enterprise features that someone is willing to pay a premium for, versus what just produces an out-of-the-box great experience. Because I think so much about open source is really people doing their own self-evaluations of the technology. So, self POCs, if you will, so, you want to make sure that’s great, because you can’t control that. You may not even know who downloaded it in the first place. So, that’s where you really want to put I think a lot of energy into the open-source project. And then, it’s as more of those production features that are important to the larger enterprises, where those I think you can hold back.

Why Not 100% Open Source?

Mike: I interviewed Mike Olson from Cloudera, you might know him.

Justin: I do, oh, yeah.

Mike: He was one of my first guest, and he gave a very similar comment to what you were just saying. And he was quite emphatic about it. And yet, Cloudera recently switched to a 100% open-source strategy. And other open-source companies have also, for example, Chef, and of course some of the older, Linux distributions are, RedHat and SUSE are all open source.

And so, one of the things I’ve been wondering myself is, you can use the open-core strategy. It makes perfect sense I think to business people, but I also wonder, this license is paying for the right to use the software. Do you think that customers are actually paying for the right to use, or they’re paying for the engagement with your organization? And do you think, if you made it all open source, it would actually negatively affect your revenues, or customers would still want to engage with Starburst a company?


Justin: I think I can speak from experience here, because part of what’s interesting about our history is that we’ve kind of evolved through the various open-source business models in our brief history. So, when we first started the company, we didn’t have any proprietary IP, so we naturally just sold support contract. So, the early customers that we started with were just support contracts.

I think the challenge that we quickly identified is that support alone is not the most compelling value proposition. It is to some, I’m not saying it’s not, but it’s not a sufficiently compelling I think to win over a broad set of customers.

I think that’s where the open-core model, at least for us, really created an inflection in the business, where, you know, now we had a real tangible reason. And, by the way, for what it’s worth, I think we learn this actually from our own prospects, that those who are actually huge fans of Presto, who are huge fans of us even, who were champions of what we’re doing, but couldn’t quite get the purchase across the line in those early days and that first year of our operation, because they couldn’t justify or explain to their boss why one would have to pay for something that was free essentially. And that was the tricky conversation was like, “Well, you get this for free, why would you pay for it?” Like, “We don’t need support, you guys are smart, you can support this, right?” And those are the kinds of conversations that can take place. So, I think that’s where the open-core model is really helpful to the business.

Monetization Strategy

Mike: You’re selling a product that’s almost like a data access product, like I call the Presto Interface, and it connects two back-end databases. How do you price an interface, like what are the buckets – I don’t need to know the price but I’m just wondering like, how do you land and expand, and how do you set up the model, so that it’s easy enough for customers to understand, and you can charge enterprise software rates for it?

Justin: The way that we monetize this is based on CPU consumption. Technically, we actually anchor on Virtual CPU consumption because so many of our customers deploy in Cloud environments. So, that’s the underlying metric, and the reason that’s a good proxy for us is because basically Presto is a technology that scales out super effectively, and is leveraging compute-intensively to execute the query.

So, it’s basically, like, the more queries you have, the more data you’re accessing, the more complexity of the workload, and the more users who are hitting the system you talked about, the strong concurrency that Presto provides. Those are kind of the dimensions that drive CPU consumption up, and we just monetize with that. It’s a straightforward metric I think that customers easily understand, and seems to work for us.

Optionality

Mike: In one of your previous talks I listened to, you talked about optionality, and how you recommended basically that optionality essentially drives freedom – how does Presto help you get that optionality?

Justin: Presto creates optionality by virtue of being disconnected from storage, is essentially not having its own storage layer. I used the analogy in the beginning that we’re like a database without storage. The other way I put it for people who are familiar with data warehousing is, we provide data warehousing analytics without the data warehouse. That’s another way to think about it.

So, because of that, it basically allows you to think about Presto as an abstraction layer, above all the data sources that you already have. And you can kind of skip the complex and time-consuming task of having to move data around, create copies of data, ETL it, extract it, transform it, and load it into another system, instead you can just do that at query time, and access that data, and get your results.

So, that gives you a lot of flexibility, and I think one of the ways we’ve seen that play out is, we have a lot of customers that have a classic data warehouse, maybe it’s Teradata or Oracle. And then, they’ve got some kind of a data-lake strategy, and maybe that’s either Hadoop on-prem, or maybe it’s S3, or some Cloud-object storage.

And the first step might be to use Presto to just join tables between these two systems. You’ve got some kind of user behavior logs in your data lake, and you’ve got billing data in your classic data warehouse, and you want to be able to correlate the behavior with the billing, let’s say. That would be a very common use case for us. You can do that with a simple query and Presto.

Now, what that allows you to do then, as a second step, is, essentially, hide from your own end-users, be them internal analyst, data scientist, or even customers. Where the data actually lives, they don’t need to know that they need to go to the data warehouse to get the billing data, and they need to go to the data lake to get the user behavior – they’re just submitting a query, and they don’t know where the data lives anymore.

And by doing that, you’re able to actually decouple your end-user from where the data is stored and give the architects in the organization the ability to now decide, based on cost or performance, where that data should actually live. So, you don’t need to pay Oracle or Teradata tremendous amounts of money to store your data anymore. That is, of course, the most expensive storage you’re going to find.

You could instead choose object storage, like Ceph from RedHat, or there’s a company on the West Coast called MinIO, which creates S3-compatible object storage. And that’s very inexpensive, relatively speaking. And you can deploy all of your data, or start to migrate your data into this lower-cost storage, and still be able to access it, while your end-users are none the wiser to where the data lives – they’re just getting their results. So, I think that’s where you kind of get to create this optionality and be flexible about where you put your data over time.

Mike: In addition to the technical level, I always think about optionality as, does the open-source license itself also lead, or open-source infrastructure in general, also lead to more optionality and freedom?

Justin: For sure. I mean, I think the notion of not having vendor lock-in is really important to customers. Increasingly so, I think they’ve been burned over decades of very expensive technology that becomes legacy technology, and then, their stock and the pricing goes up. And they don’t feel like they have much ability to resolve that. And I think the open-source license in and of itself gives customers a lot of comfort, in knowing that, you know, a worst-case scenario, they can always roll this themselves, with the open source. But also, Presto is able to read open data formats, which is also great. Because I think data lock-in is probably the worst kind of vendor lock-in.

And in a traditional database system, once the data is loaded into the database, it’s kind of not easy to get access to or get the data out, without continuing to pay for that database system. But if you’re using open data formats, which we’d really pioneered during the Hadoop era, these are like ORC or Parquet, if you’re familiar with those file formats, you can store them anywhere and query them with a multitude of tools. You could use Spark to train machine-learning models, working off the same Parquet files that you’re querying via SQL for Presto. And I think that gives customers a lot of flexibility as well.

Open Source V. Commercial Market Size

Mike: I read a lot of articles about how enterprises are really moving towards open source, certainly when you look at the large consumer-facing services, like you mentioned, Netflix, Facebook, etc., they’re building a lot on open source. Then, you look at the size of the market, and you see that, actually, from a market percentage of open-source software is still only a tiny amount – is the move to open source really real, or is it more hype than reality?

Justin: When you say the market is small, do you mean measured in dollars, or what’s the metric there?

Mike: Dollar, yes.

Justin: Yep, makes sense. And that’s the key piece. I think it’s probably super widely used, but the percentage of open source that actually gets monetized is relatively small. And I think that’s what’s translating to the overall dollar amount, seeming small, relative, to the proprietary solution. I think if you measured in terms of impact to businesses and organizations, I think it’s actually probably the reverse actually, where you might have more open-source software having bigger impact than the proprietary.

But, of course, the challenge – and I suppose this is the purpose of your podcast – is figuring out how to monetize that effectively, so that you can build a successful business, while having that broad impact that open source provides. And I do think that, as vendors, we’ve gotten smarter over the years about how to do that.

I mean, the way I think about open-source business models over history is that it started with the sort of pure-play support model, just offering support, nothing proprietary. I think kind of Generation 2 was the open-core model that we’ve spent time talking about. You know, Cloudera popularized that, as did many other companies. And I think Generation 3, which is actually where we’re moving as well as a company, is cloud-hosted, SaaS offerings.

And, basically, being able to make part of the value proposition, the simplicity of the solution that you can deliver as a SaaS, and I think data bricks is a great example of that. So, I think that’s kind of the next frontier. And I think, as more and more open-source companies move in that direction, I think they’ll probably have better success in monetizing that background usage of the open-source. Because, there’s so much you can control now from a SaaS perspective to really enhance the experience, that is just easier for customers to use your SaaS solution, rather than having to maintain it themselves.

Starburst Cloud Strategy

Mike: I normally ask companies if they’re developing a SaaS offering. And I think that there are some companies where it’s been really successful like MongoDB, Eli Horowitz from MongoDB is emphatic that cloud is the best business model and everyone should be doing cloud. In doing the 50+ podcast, I found that the results have been mixed, where sometimes companies find that it’s a good way to reduce the try by fly time, where the cloud offering is a good introduction, but then the revenues are mostly derived from the enterprise, like self-hosted version.

And it takes a lot of effort to actually — it’s almost like a whole new product, like you’re building a software platform, a great software platform, and then, building the SaaS is almost like a totally new product in different business endeavor. What’s Presto done in this area? Are you working on it? And do you have any thoughts about how that experience is going, sort of making a cloud offering out of the software?

Justin: We definitely are working on it, and we have been actually for quite some time. And it is hard work. I think there’s no doubt about it, but I do think that some recent innovations around Kubernetes actually make this easier than it maybe was a few years ago. Because Kubernetes can kind of create a uniform, almost like operating system, if you will, that you can deploy your software within, and therefore, sort of create the software once, rather than having to have all these different kind of custom versions for different types of deployments.

I think that’s a game-changer. It’s certainly something we’re betting heavily on, as we approached that by trying to create the same experience, regardless of where customers deploy.

Single-Tenant V. Multi-Tenant SaaS

Mike: Most of the old cloud services were multi-tenant, but, are you thinking, like with Kubernetes, we could maybe build a single-tenant and deliver sort of like, “We’ll host it for you, you’ll host it.”, but it’s going to be sort of the same thing?

Justin: That’s exactly right, yeah. You know, I don’t want to give away too much of our strategy just yet because we haven’t released the cloud product yet. But I think those are really important concepts that you highlighted there, that we’re very interested in.

Building A Sales Team

Mike: So, something you must have done a really good job at is building the sales organization, because $10M in sales hasn’t happened by accident. And I think sometimes founders underestimate how difficult it is to build a sales and marketing organization – did you have any thoughts or advice you could share on, like how that went for you, like, how you pulled it off, like how do you do it?

Justin: Yeah. I think the first step I would say is trying to understand yourself as the entrepreneur – what the sales process looks like, like, what are customers buying, how do they understand the value proposition. And I’m a big believer in entrepreneurs selling the first few customers themselves. I think you learn so much, even from a product management perspective of what you need. You get to experience what your sales reps will experience when you start to scale up. So, I’m a huge advocate of that.

The second thing I would say is find a great sales leader. Because you know there are folks out there who have done this many times before, and know what it takes to sort of scale up a sales organization. And, certainly, that was impactful for us in finding our VP of sales, who’s done a great job of really scaling up that organization quickly.

Team

Mike: One question I had was, the pandemic has changed things were much more remote –  were you remote before the pandemic, and what’s your plan for growing the team in the next couple of years?

Justin: We were not entirely remote, but we did have some level of distributed nature to our team. Before the pandemic, we had major teams in Boston, the Bay Area, and then, actually Warsaw Poland as well, as an important development center for us. So, we kind of had to work across these three geographies, which are obviously spread out by 9 hours of time zones. And I think that gave us maybe a head start on the pandemic. But to be perfectly frank, I mean, I would much rather go back to actually having an office, and being able to interact on a one-on-one basis personally, with so many of these people.

Because I think what’s been weird for us is, we have scaled so quickly this year that I have not met probably half of our employees at this point, which is just a weird thing, to have grown the company so much. And the only interactions I’ve had have been over a Zoom call. So, that part I miss. I do think we’re all trying to make the best out of it, of course. And I think good best practices are sort of documenting everything, having frequent all-hands meetings, where you get everybody together, but there’s still no real substitute I think for one-on-one interaction.

Founder Advice

Mike: The last question, any advice for new entrepreneurs who are launching a business, and they want to use open-source software development as part of their business strategy?

Justin: My advice would be to think early about that key question that you asked earlier in the podcast about what your monetization strategy is going to be, and on along what metrics are you going to, or what criteria I should say, are you going to be separating the enterprise value proposition from what you give for free, and I think kind of have a strategy early on and stick to it. Because I think that will just make the decision-making process so much easier for you as you go along. You won’t have to debate each and every feature that you come up with – you’ll just sort of know because it will fall into a framework. That would be my piece of advice.

Close

Mike: Justin, thank you so much for sharing all this knowledge and experience with us.

Justin: Thank you, Mike. This was fun, and it was great meeting you.

Mike: Thanks to the Starburst team for reaching out and coordinating the podcast. Audio editing by Ines Cetenji, transcription and episode website by Marina Andjelkovic. Cool graphics by Kemal Bhattacharjee. Music from Broke For Free, Chris Zabriskie and Lee Rosevere.

Next time, we’re joined by Miguel Valdes Faura, CEO and Co-Founder of Bonitasoft, a global provider of BPM, low-code, and digital transformation solutions.

Until next time, stay safe, and thanks for listening.

Episode 50: DataStax NoSQL solutions built on Apache Cassandra with Kathryn Erickson, Open Source and Ecosystem Strategy

Intro


Mike Schwartz: Hello and welcome to Open Source Underdogs. I’m your host, Mike Schwartz, and this is episode 50 with Kathryn Erickson who helps lead open-source strategy at DataStax. Founded in 2010 and currently employing about 500 people, DataStax was one of the first and most successful companies in the Apache Cassandra big data Ecosystem.


Kathryn has an engineering background. You can listen to some of her great deep dives into the tech on the DataStax website. In her role on the strategy team, she’s helping to lead the company into its next phase of growth and community engagement. I hope you’ll enjoy this episode. And if you do, don’t forget to share a link on social media. You can find all the episodes on opensourceunderdogs.com, or you can retweet our announcement by following us on Twitter. Our handle is @fosspodcast. So, without further ado, let’s carry on with the interview.

DataStax Origin

Mike Schwartz: Kathryn, thank you for joining us today.

Kathryn Erickson: Sure, of course, thank you.

Mike Schwartz: Most of our listeners probably know about Apache Cassandra, one of the most popular databases for big data, but how did DataStax evolved in relation to the Cassandra project.

Kathryn Erickson: DataStax was founded by Jonathan Ellis and Matt Pfeil, both employees of Rackspace. Jonathan, being contributor to Apache Cassandra and Project Share as well, was considering leaving Rackspace, and Matt Pfeil went to talk to him and say, “Hey, there’s some really cool stuff going on here, you should really consider staying.” And by the end of the conversation, they were founding a company together.

And so DataStax was founded to support Apache Cassandra. Over time, we began adding Enterprise features and selling an Enterprise distribution of the database with these features added, and then, of course, more recently, the cloud platform as a service offering as well.

Evolution Of Support Offering

Mike Schwartz: Actually, I didn’t realize that you started out providing support. Because when I first ran into DataStax, I guess I had just known it as a distribution of Cassandra. And now, I see that you’re also providing support for the open-source distribution. Can you talk a little bit about how that’s evolved over time? Has it always been there or has there been a focus on for or against doing that?

Kathryn Erickson: It hasn’t always been there. When DataStax was founded 10 years ago, there wasn’t really a playbook for how to build and run a successful open-source company.
We were founded around the premise of providing support and consulting for Apache Cassandra. Over time, we did, all for the Enterprise Edition, but what you see with most Enterprises is that they have a mix of the Enterprise version and open source. For some customers, that’s dependent on the criticality of the data, and for other customers, it’s dependent on the features or the distribution, being the as-a-service offering or self-installed on-prem.

And so, what we saw in the last year was that there were some obvious things that we weren’t doing, and our customers needed support and consulting around open-source Cassandra. We are beginning to open-source a lot more of the features that would build Cassandra abundance, and so, it made sense to bring those offerings back.

Astra – DataStax Cloud Offering

Mike Schwartz: Okay, and you mentioned that DataStax launched a new hosted service called Astra. Do you see that product as a driver for revenue, or is it just an easier path for customers to test drive the product?

Kathryn Erickson: I think that will evolve over time. I think at launch, it is the easiest way to learn Apache Cassandra. And I think as we launched the hybrid option, I believe that’s later this year, that would become a more significant line of revenue.

Pricing

Mike Schwartz: Most of the revenue today I guess is from the license Enterprise product, so focusing on that, a lot of open-source businesses are moving towards consumption-based pricing. And I’m wondering, what kind of metrics do you use to determine what is consumption?

Kathryn Erickson: You know, a cloud-based offering consumption is based on capacity. And with our licensed product and with Luna, the open-source support offering, our focus this year has been around simplification of the pricing model. And we revisit that each year.

With the Enterprise product, we previously charged for the Enterprise license, and then, an optional additional fee for advanced workloads, like Spark analytics and graph. That’s confusing for the customer, they just want a simple pricing mechanism. So, we collapse that pricing. And then, of course, for larger deals ,we would have ELAs, or special terms to accommodate those customers.


Mike Schwartz: That consumption is based on, like, per CPU, per server, or how do you actually figure out what is the size?

Kathryn Erickson: It’s true capacity-based, the size of the data set being stored. And as we move to Astra hybrid, which will be that offering on-prem, I think we’ll consider that pricing option there as well.

Market Segmentation

Mike Schwartz: Data persistence is like the most horizontal market on the planet. Every company basically needs to store data. When you can sell to everyone, it’s sort of a blessing and a curse. Do you segment the market at all vertically or by use case, or do you just not segment the market?


Kathryn Erickson: It’s hard to segment when you’re serving a pretty broad market. What we try to do is have as easy of an on-ramp for the different verticals as possible. We see data models look similar between IoT use cases, inventory and messaging data models would be similar.
So, we don’t segment the market for go-to-market strategies, but we try to find places of repeatable consulting efforts to speed up the successes for those customers.

Partnerships

Mike Schwartz: When you took on the role of director of strategic Pprtnerships, you probably did a survey of the range of partnerships that exist. Can you talk about like what is the partner landscape look like at DataStax?

Kathryn Erickson: I ran our technology partner program, and there’s two other sides of that, SI partners and the cloud partners. On the technology side, you want to make it easy as possible for customers to consume your product.

So, in a technology partner program, you want to understand the user journey to get to your product, and make sure that those adjacent technologies have the simplest most repeatable easy to build, easy to test integrations as possible over time. If you want to think about specific companies and integrations, every database needs an ODBC and JDBC connector. And customers want those for BI, for reporting, for simple ways to move data in and out of the system, but in the last few years, most customers also want to see Kafka connectors and more high-speed ingest Pub/Sub integrations.  So, we want to accommodate those as well.

Mike Schwartz: Coming on the System Integrator side, you know, at Gluu, we found that those have been essential for us, to be able to focus on innovating the product versus getting involved in specific projects. But there’s such a broad range when you’re serving a global market of the System Integrators. Do you consider them channel partners or integration partners?


Kathryn Erickson: We usually consider them strategic partners when we take those types of partnerships on. And the goal is usually to help us penetrate markets that we don’t currently have field team in, or packaged, or cookie-cutter solutions. If you look at some of the stuff that we’ve done with VMware and with partnerships at Dell, we want to assert that the product stack works as recommended for customers that are used to seeing these reference architectures from these larger integrators and technology companies.

Most Important Partnerships For Driving Revenues

Mike Schwartz:  Which partnerships, do you think are the most important for actually driving growth?

Kathryn Erickson:  Deloitte’s been in a role to our federal business, they know that space better than any startup could hope. VMware for helping to modernize Enterprise platforms. Enterprises that are looking at Cassandra and looking at DataStax are usually going through some type of digital transformation. And the product that they already have in place is VMware. So, everything that we could do to make that migration to know SQL smooth was helpful to those customers. VMware has been a pretty big partner in my journey.

Open Source Strategy

Mike Schwartz: Some of the companies we’ve interviewed are moving to a 100% open-source strategy, specifically Chef and Cloudera. In the past, the value property DataStax, it had improved distribution of Cassandra.But do you see DataStax maybe moving more in the direction of open-sourcing its platforms and some of that technology it’s developed?

Kathryn Erickson: We are open-sourcing a lot more. We try to stick to simple rules for open sourcing, simple rule is, it’s a Harvard Business review article, simple rules for a complex world.
And so, simple rules for open source, if it increases adoption Cassandra, it should be open-sourced. And if it’s Enterprise feature that’s more specific to Enterprise customers, like security features or advanced replication options, then that would be kept proprietary.

And then, where should something be open-sourced? Well, if it makes a change to the core of Cassandra, of course it should go to the Apache project. And if it increases abundance, but it’s not impactful to the core of the project, then it still should be open-sourced, but maybe able to exist in a DataStax repo or different foundation.

Does Open Source Help?

Mike Schwartz: Do you think the wider open-source community A Cassandra helps DataStax too?

Kathryn Erickson: Of course, open source is all about positive sum games. I think it was Thomas Jefferson that said, “If use my light to light your torch, then we both have light.” And that’s how open-source works. The more communities and more companies that you can move from being other to being self, the larger the positive sum game that you’re playing. So, it’s open source, and open-source abundance is absolutely essential to the success of any open-source company.

Thoughts About Open Source Foundations?



Mike Schwartz: Any thoughts about Cassandra being hosted at the Apache Foundation versus perhaps Linux Foundation or the CMSF?

Kathryn Erickson:  I don’t have any opinions on the other foundations, but I think that Apache Cassandra will always be at home with the ASF. They have their simple rules for what it means to protect the open-source nature of a project, and they don’t waiver. And for a vendor backing an open-source project, that can be like a Northern Light, you can lose your way, and you can always look back up and reorient towards the community.

But you know, there’s nice things when you see CNCF, you know, the marketing wing, and the power of the CloudNative messaging that’s there. But there’s no reason that projects can’t have pieces that exist in different foundations either.

We see ourselves and others that build communities operators or management APIs or drivers is an example, they should live in a project, but management tooling that exists that the maintainers of the project wouldn’t want entry. So, something like that maybe should live in a CNCF type of foundation that’s focused on CloudNative. But no Apache Cassandra will remain Apache, and that’s a tome.

Industry Changes In The Last 10 Years

Mike Schwartz: So, DataStax is one of more mature, well-established companies in the open-source ecosystem today. What are some of the challenges you think that you are looking at now that were different than when you got started?

Kathryn Erickson: When I started a DataStax, it didn’t always feel like we had a lot of competition. And I think as other good distributed databases emerged, we adjusted to having competition. I think the obvious answer that most people would expect is pressure from the public Cloud vendors. But if you stay oriented on the positive sum nature of open source, then that becomes easy to embrace as well.

So, there’s changes in understanding the virtuous cycles of open-source, understanding how to build software as-a-service more quickly as Kubernetes has matured that’s become a lot easier. So, I think the ecosystem around us has matured a lot, the playbooks around how to build a company around open source have matured. And there are more senior projects that kind of exist in our ecosystem that we can work with and learn from as well.

Is Open Source Table Stakes For Databases?

Mike Schwartz: You know, most of the databases that have been released in the last, let’s say five to eight years or so, have been open source. Is being open source basically like table stakes now? So, is it a non-differentiator in the database market?

Kathryn Erickson: I think that if you’re moving from a proprietary relational system, and moving towards NoSQL, then you’re obviously moving into an open-source world. And if you can choose something that has a security life, security blanket that you know will outlive any vendor behind it, then you should consider those options first.

I think that it would be hard to start proprietary databases without the support of the community and of these foundations. I think Snowflake has done an exceptional job and is kind of the exception to the open-source game. But, you know, they were disruptive in a much different way. NoSQL in general is an open-source family.

Data Platform Trends

Mike Schwartz: Just a general database question about the database market. So, we’ve interviewed a probably more database companies on this podcast than any other type of company, but have you ever seen a real shift in the way that customers think about databases.

In the old days, I think you just used to get one database and hope it did everything, but have you seen a sort of on the technology side a shift in the way that companies are thinking about data and databases now, with more SaaS hosted offerings and more database offerings, like in general.

Kathryn Erickson: Yes. I think I think this is definitely the age of data platforms. With Cassandra, we see customers considering NoSQL when they’re using the relational system. And it can’t support the throughput that they need anymore, or they need to replicate more geographies, or exist in a multi-cloud or hybrid environment.

And so, that’s when you consider Cassandra. If you look at when you might consider Mongo, you want to get quick start with a developer friendly environment that’s great for mobile. What you start to see is that there’s a certain fit for purpose that the different NoSQL databases have. We’ve started to see an emergence of multi-model systems that move forward. And consolidating those capabilities, we have that with our Enterprise products and their integrations for graph analytics and search, we want to help customers build high-growth applications, high-speed transactional applications are the sweet spot of any Cassandra deployment.

Advice For Startup

Mike Schwartz: This is a question, a sort of a generic question for entrepreneurs who want to launch a business around an open-source product. I’m wondering if you have any advice, for let’s say, startups? And it could be general and it could be about partnerships.

Kathryn Erickson: You don’t have to invent a path to success, you can listen to the A16 podcast, you can look at other companies that are out there. You can go through so many success stories on podcasts like this, you can listen to Cockroach, and there are Open Source Underdogs podcast talk about how they’re thinking about licensing other companies. You know, having similar conversations, really understand what has made other companies successful, and don’t try to invent that yourself.

How To Improve Tech Diversity?


Mike Schwartz: Last question. As you’ve might noticed, there aren’t enough women in the tech business, including there haven’t been enough women on my podcast, so thank you for joining. What can we do to reverse that trend?

Kathryn Erickson: I think there’s a lot that we can do. as You are on the side of making mistakes, just try things, and if it’s not the right thing or if it doesn’t work, try something else. We’re going to do a program at DataStax, you know, Jumpstart, if you’re a woman or a person of color, and you want to learn Cassandra, and you don’t know where to start, just hit the button, sign up. Somebody from the team will meet with you for 30 minutes and help you get started. That might work, that might fall flat, but we’re going to just start trying stuff. And I think everyone should just start trying the ideas that they have, and we should all tell each other what’s working.

How’D You Get Started?

Mike Schwartz: How did you get started in the tech industry?

Kathryn Erickson: Well, my dad taught Computer Science, Community College, and I was going to be a DNA researcher. And I just wasn’t very good at it, and I thought, “You know what dad’s over Computer Science, we’ve been playing with computers all of our lives.” That sounds more like playing then working, it’s been that way ever since. It feels more like playing than working every day,

Mike Schwartz: That’s great. Thank you so much for joining us today, Kathryn, and sharing your insights. And best of luck at DataStax.

Kathryn Erickson: Sure. Thank you.

Closing

Mike Schwartz: Thanks to the DataStax PR team for helping us to schedule some time with Kathryn.

Editing by Ines Cetenji. Transcription by Marina Andjelkovic. Cool graphics by Kamal Bhattacharjee. Music from Broke For Free, Chris Zabriskie and Lee Rosevere.

Next episode we’re excited to have Cornelia Davis, author of Cloud Native Patterns, a Manning book that needs to be on every software architect’s bookshelf. She’s also the CTO of Weaveworks. She was fantastic, so don’t miss it. Until next time, thanks for listening, and stay safe.