big data

Episode 54: Justin Borgman, CEO of Starburst, the Company Behind the Presto Project

Intro

Mike: Hello, and welcome to Open Source Underdogs. I’m your host Mike Schwartz, and this is the episode 54, with Justin Borgman, Chairman, CEO, and Co-Founder of Starburst, the company behind the Presto Data Access Project.

Before we get started, I have a quick request – we all want to help open-source founders and startups. I make the podcast, but I need your help to get the word out, so tell your friends, post on LinkedIn, tweet out a link, post on Hacker News, or follow me and share one of my posts on LinkedIn, whatever you think makes sense, go for it.

One of the themes of Machiavelli’s the Prince is Virtu e Fortunavirtu meaning excellence in your domain, and fortuna meaning luck, whether good or bad. I really like how the story of Starburst exemplifies this 500-year-old insight.


Justin has a ton of domain virtu. He has deep technical knowledge, but he’s also on the lookout to harness fortuna. He’s one of the few podcast guests to acknowledge it. And Starburst earns its name because it’s one of the most stellar open-source business success stories I’ve heard in the last few years.
There’s so many great insights in this episode, a lot to think about. So, without further ado, let’s get on with the interview.

What Is Presto?

Mike: Justin, thanks for joining the podcast today.

Justin: Hey, Mike, super glad to be with you.

Mike: Before we dive into the business stuff, I find it’s helpful to talk a little bit about the technology. Can you start by giving a brief history of the Presto project? What it’s good at, and how the community coalesced around it?

Justin: It was really back in 2012 for developers at Facebook, Martin, Dain, David, and Eric came together to create a new infrastructure project that would be a faster way of querying data at Facebook. Facebook, of course, collects massive amounts of data, hundreds of petabytes worth of data , and needed a faster alternative to a prior project that they also developed and they called Hive.

Hive was a SQL engine for Hadoop, and it just wasn’t fast enough. So, Presto was created to be a faster means of accessing that data. But it has one really important differentiation in addition to the speed, which is the ability to access data anywhere. So, it’s like a database without storage – that’s kind of one way to think about it.

So, it looks at storage in other systems, which could be Hadoop, it could be S3 and AWS, it could be a traditional database, like Oracle, or Teradata, or Snowflake. And regardless of where that data lives, Presto can reach it, query it, and deliver SQL-based analytics.

So, that’s kind of what makes it special, is the ability to access the data everywhere. And that’s gained particular momentum, I would say more recently, as many large enterprises have data silo problems, where they have data in a bunch of different databases, and are now perhaps moving to the Cloud in some fashion.

Mike: And if I’m not mistaken, high concurrency is one of the areas that make sort of this data access plain different?

Justin: Yes, exactly, it’s very fast, and can support high concurrency. And in a lot of ways, this technology was sort of, I like to say built in reverse, in the sense that it was tested at ridiculous scale from day one. You know, very often, when you start something new, you don’t really know how it’ll work at scale until you get people using it. But because it was really born out of the internet companies, Facebook, and Uber, Airbnb, and Netflix were all early adopters to use the technology, it was really tested, and at scale, and as a result delivers great performance and concurrency.

Origin Story

Mike: Starburst is not your first company, you are part of a team at the company called Hadapt that’s sold to Teradata in about three and a half years, I think.

Justin: Yep.

Mike: How did that experience lead you to Presto?

Justin: In a lot of ways, this is really a continuation of that journey that began 10 years ago. So, that was 2010 that I started Hadapt. Hadapt was a spin-out actually from Yale University and the computer science department – there’s some research called HadoopDB, which was pretty pioneering research at the time, in terms of thinking about Hadoop as a data warehousing solution, and being able to deliver fast SQL analytics on top of Hadoop.

So, we spun that out, raised Venture Capital, built that business over nearly four years, as you mentioned, and then sold it to Teradata. We had ups and downs, definitely lessons learned through that experience. And I think, really, my discovery of Presto after arriving at Teradata in 2014 was kind of an exciting opportunity to reimagine the strategy that we had with Hadapt.

So, Hadapt was the SQL engine for Hadoop, Presto is a SQL engine for anything essentially, allows you to access data anywhere.it was an opportunity to basically take all the lessons learned from the first experience and start to apply them over again.

It was actually my team from Hadapt that ended up contributing a tremendous amount of software to Presto, and working with the guys at Facebook, who created it to really make it an enterprise-grade piece of technology. And I think, as we started to see Presto get more and more capable, and see more and more people use it, that was what created the idea in our head that maybe there was a business to be formed around this.

Community Engagement

Mike: It’s a really interesting opportunity, and I can’t actually think of another example like it, but when I’m talking about open source, I sometimes talk about three types of open-source companies. One would be volunteer, where a bunch of guys or girls get together and write some piece of software that they love, but not necessarily for a business.

And then, I talk about corporate open source, where there’s some piece of software, where a company funds it, but it’s not their core business, but then, they realize that makes sense for them to collaborate like Kubernetes, let’s say ,and Google, and these pure-play, open-source companies, where the company behind it is developing it, and they’re the main contributors.


And so, lots of great open-source projects come out of this corporate open-source area, the podcast that is mostly focused on pure-play because they were trying to help entrepreneurs and founders start open source, use open source as part of their business model. But you’ve sort of, like, created a very interesting situation, where you have a mix of corporate and pure-play because you’re benefiting from, not just the community, but, really, Facebook is a big contributor to the project to — I heard almost 50/50. So, how’s that really evolved, and how do you continue to encourage this very symbiotic relationship?


Justin: You’re right. Preston has a very interesting history to it, an interesting journey. It started as a small project at Facebook. When we got involved at Teradata, we were able to apply a few million dollars a year of R&D budget into advancing that as well. And then, of course, you’ve got a few other companies contributing also along the way.

And, as a result, all of that kind of accelerates the development of the project. And I think that maybe what’s most unique here is not only that Facebook created great infrastructure software as a byproduct of their business – they’ve certainly done that before – but rather that there was kind of a commercial partner very early on, and myself, and my team at Teradata thinking about the commercial applications of this.

So, you know, back in 2014, Presto was still in its early days, Facebook wasn’t trying to monetize it obviously, that’s not their business, but we were already thinking about how this could be used by Fortune 500 customers, and what difference this could make to their business. And I think that led to its very enterprise-applicable evolution, and set us up really well to eventually commercialize this in 2017, when we left Teradata, the creators of Presto joined us from Facebook. And we went off on our way to build this business.

Idea Incubation

Mike: So, you were working on Presto while you’re at Teradata. And did Teradata ask for any equity, or how did that work when you told Teradata, “We want to start this company basically working at Teradata? Like, what was that like?

Justin: Yeah, well, what was interesting about that – and I guess just to set the context, I think Teradata, from 2014, when they acquired my company through to probably today, has gone through various iterations of kind of rethinking their overall strategy, in terms of how they evolved into this next generation of sort of Big Data platforms. Because they had great success in the ‘80s, ‘90s, and early 2000s, as this kind of monolithic data warehouse, where you would ingest everything and store it in one place.

But obviously that became very expensive over time. And the appliance model, hardware and software combined, wasn’t necessarily set up for this future as people move to the cloud. So, they’ve gone through a lot of iterations. And it was really in that iterative process, where they weren’t really clear where they wanted to go, that they actually felt like Presto is maybe a distraction for them.

So, that actually created the opportunity, I think, for us, to say, well, we think it’s a little more than a distraction. And, you know, we’d be happy to sort of take that off your hands and work on this together.

So, it was a very amicable split – we remain partners, we’re still partners today, where we work together on some customer accounts, the technologies work together, we can access data in Teradata, for example, from Presto. So, that partnership remains. But it was one that I think for them, they viewed us as sort of taking Presto off their hands because there were maybe close to a dozen companies within their customer base that were using Presto. So, we were able to deliver really first-class support to those customers, you know, not provide any interruption there, even as we left and formed this new business. So, they don’t own equity, it’s purely a partnership.

Identifying Opportunity

Mike: It’s just amazing like how you deal your business, is you got a huge company Facebook to help you grade and test this infrastructure. You got to do R&D in Teradata, and then you started the business with customers – it doesn’t get any better than that really.

Justin: Now, you’re absolutely right. And believe me, the good fortune is certainly not lost on me. You know, advice I give to entrepreneurs of any type, not just open-source entrepreneurs, is to just have your eyes open to opportunity. I think it passes us all by all the time, and very often we miss it. And I think seeing it, and then, you know, running and jumping on it, it certainly has been beneficial in my career. I’m even going back to my first company and spinning out technology from Yale, which you could argue was the great benefit of various government research grants, funding that research in the first place. So, keeping the eyes open and seeing an application for where it could become a business.

When To Raise Money

Mike: So, initially, you didn’t have to raise money because you had some customers that came that provided some runway, but you did raise a series A, and I guess, October 2018, so, pretty recently. So, what was in the decision process to say, “Okay, now capital is going to help us.”, like what were some of the benchmarks that you reached, that helped you say, “Now is the time we should do that.”?

Justin: So, that’s exactly right. We started without raising any capital. That allowed us to build a profitable cash-flow positive business over those first two years of operating, which I think, by the way, as an aside, gave us a lot of opportunity to be patient and sort of think through exactly what we wanted our go-to-market strategy to be, what kind of strategy we wanted to take around monetization.

And we didn’t have the pressure of investors necessarily breathing down our neck, which I think many, many entrepreneurs have in those early days. So, I think it was a great way to start a business, what forced us to change and actually consider taking capital was really a realization that the market opportunity was bigger than we felt like we could actually satisfy growing at purely an organic rate.

So, we took that series A really as a growth round, you know, even though it’s called the series A, I think it’s a little bit misleading, because it’s probably more like a series B for most companies in that. Not only was it a large amount of money 22 million in that first round, but it was really deployed towards expansion and rapidly growing the business. Less so about proving product/market fit, which is more typical in a series A.

As you said, we did a series B shortly thereafter, which was probably more like a series C, adding another 42 million. So, we’ve gone from raising nothing to now 64 million. And really I think that was all made possible by really building the fundamentals first. Making sure you have that product/market fit sorted out, and then, you know, applying fuel to the fire to expand.

Revenues Pre-Investment

Mike: What was the revenues when you raised the series A?

Justin: Yeah, well, if it was 12 months looking forward, I would say it was already looking north of $10M at that point. So, that allowed us to really take the funding and apply it to, again, expansion rather than kind of sorting out the basic product details.

Mike: And what year did you actually start the company?

Justin: 2017.

Mike: That’s pretty amazing – two years to go to $10M. It’s pretty stellar.

Justin: Thank you. I mean, again, I think a big advantage here was that, in some ways, this was like building the same company over again – I mean, there are a lot of differences between this and my first, but they’re also enough similarities, just in terms of the types of customers that we sell to, the types of use cases, the types of problems that they’re trying to solve. So, I think that historical knowledge was advantageous for us to just move a little bit faster this time around than we did that the first time.

Balancing R&D Investment

Mike: Okay, switching gears a little bit into more basic business stuff. You mentioned in one of your previous interviews that I listened to, that Starburst is basically pursuing an open-core strategy. So, performance, robustness, security patches that goes into open source, things like connectors, security, ease of use, I guess GUI deployment stuff, goes into the core. One of the questions that I’ve sort of wonder about is, how do you decide how to prioritize R&D in open source versus the enterprise features when you go the open-core route?

Justin: Yeah, I mean, I think that’s the key question. So, it makes sense why you’re asking it, and I think it has to be on the mind of every open-source entrepreneur. And it’s a delicate balance because, on the one hand, you want to make the open-source project as useful as possible to get widespread adoption. Because really that’s your lead generation vehicle – I think that’s the way to think about it.

A lot of people say open source is really just another form of a freemium business model. There’s a free component, and that just happens to be open source in an open-source model. And then, how do you kind of upsell to the Enterprise version. So, for us, I think the logic was, what are the reasons why people use Presto anyways in the first place.

And we think performance is a core element to that. So, we wanted to make sure that performance is always great, right out of the box, with the first experience of it, including the open-source version. So, that’s why a ton of work goes into open-source around performance enhancement, scalability enhancements, those kinds of things.

And then, we think about, well, what do people in enterprises, who are willing to pay for this stuff, what do they want. And that’s where it is, things like security features, which are just essential for any large, mature enterprise things, like role-based access control, data masking, if you’ve got social security numbers or credit card numbers, being able to mask digits appropriately, having audit logs for querying.

And then, because Presto access is all these different types of data sources, it also made logical sense that if you’re going to access a database like Oracle, or Teradata, or IBM, all of which are very expensive in their own right, well then, a customer, probably, is willing to pay for enhanced connectors to get faster throughput to those systems.

So, that was kind of the logic was trying to like think through what are the enterprise features that someone is willing to pay a premium for, versus what just produces an out-of-the-box great experience. Because I think so much about open source is really people doing their own self-evaluations of the technology. So, self POCs, if you will, so, you want to make sure that’s great, because you can’t control that. You may not even know who downloaded it in the first place. So, that’s where you really want to put I think a lot of energy into the open-source project. And then, it’s as more of those production features that are important to the larger enterprises, where those I think you can hold back.

Why Not 100% Open Source?

Mike: I interviewed Mike Olson from Cloudera, you might know him.

Justin: I do, oh, yeah.

Mike: He was one of my first guest, and he gave a very similar comment to what you were just saying. And he was quite emphatic about it. And yet, Cloudera recently switched to a 100% open-source strategy. And other open-source companies have also, for example, Chef, and of course some of the older, Linux distributions are, RedHat and SUSE are all open source.

And so, one of the things I’ve been wondering myself is, you can use the open-core strategy. It makes perfect sense I think to business people, but I also wonder, this license is paying for the right to use the software. Do you think that customers are actually paying for the right to use, or they’re paying for the engagement with your organization? And do you think, if you made it all open source, it would actually negatively affect your revenues, or customers would still want to engage with Starburst a company?


Justin: I think I can speak from experience here, because part of what’s interesting about our history is that we’ve kind of evolved through the various open-source business models in our brief history. So, when we first started the company, we didn’t have any proprietary IP, so we naturally just sold support contract. So, the early customers that we started with were just support contracts.

I think the challenge that we quickly identified is that support alone is not the most compelling value proposition. It is to some, I’m not saying it’s not, but it’s not a sufficiently compelling I think to win over a broad set of customers.

I think that’s where the open-core model, at least for us, really created an inflection in the business, where, you know, now we had a real tangible reason. And, by the way, for what it’s worth, I think we learn this actually from our own prospects, that those who are actually huge fans of Presto, who are huge fans of us even, who were champions of what we’re doing, but couldn’t quite get the purchase across the line in those early days and that first year of our operation, because they couldn’t justify or explain to their boss why one would have to pay for something that was free essentially. And that was the tricky conversation was like, “Well, you get this for free, why would you pay for it?” Like, “We don’t need support, you guys are smart, you can support this, right?” And those are the kinds of conversations that can take place. So, I think that’s where the open-core model is really helpful to the business.

Monetization Strategy

Mike: You’re selling a product that’s almost like a data access product, like I call the Presto Interface, and it connects two back-end databases. How do you price an interface, like what are the buckets – I don’t need to know the price but I’m just wondering like, how do you land and expand, and how do you set up the model, so that it’s easy enough for customers to understand, and you can charge enterprise software rates for it?

Justin: The way that we monetize this is based on CPU consumption. Technically, we actually anchor on Virtual CPU consumption because so many of our customers deploy in Cloud environments. So, that’s the underlying metric, and the reason that’s a good proxy for us is because basically Presto is a technology that scales out super effectively, and is leveraging compute-intensively to execute the query.

So, it’s basically, like, the more queries you have, the more data you’re accessing, the more complexity of the workload, and the more users who are hitting the system you talked about, the strong concurrency that Presto provides. Those are kind of the dimensions that drive CPU consumption up, and we just monetize with that. It’s a straightforward metric I think that customers easily understand, and seems to work for us.

Optionality

Mike: In one of your previous talks I listened to, you talked about optionality, and how you recommended basically that optionality essentially drives freedom – how does Presto help you get that optionality?

Justin: Presto creates optionality by virtue of being disconnected from storage, is essentially not having its own storage layer. I used the analogy in the beginning that we’re like a database without storage. The other way I put it for people who are familiar with data warehousing is, we provide data warehousing analytics without the data warehouse. That’s another way to think about it.

So, because of that, it basically allows you to think about Presto as an abstraction layer, above all the data sources that you already have. And you can kind of skip the complex and time-consuming task of having to move data around, create copies of data, ETL it, extract it, transform it, and load it into another system, instead you can just do that at query time, and access that data, and get your results.

So, that gives you a lot of flexibility, and I think one of the ways we’ve seen that play out is, we have a lot of customers that have a classic data warehouse, maybe it’s Teradata or Oracle. And then, they’ve got some kind of a data-lake strategy, and maybe that’s either Hadoop on-prem, or maybe it’s S3, or some Cloud-object storage.

And the first step might be to use Presto to just join tables between these two systems. You’ve got some kind of user behavior logs in your data lake, and you’ve got billing data in your classic data warehouse, and you want to be able to correlate the behavior with the billing, let’s say. That would be a very common use case for us. You can do that with a simple query and Presto.

Now, what that allows you to do then, as a second step, is, essentially, hide from your own end-users, be them internal analyst, data scientist, or even customers. Where the data actually lives, they don’t need to know that they need to go to the data warehouse to get the billing data, and they need to go to the data lake to get the user behavior – they’re just submitting a query, and they don’t know where the data lives anymore.

And by doing that, you’re able to actually decouple your end-user from where the data is stored and give the architects in the organization the ability to now decide, based on cost or performance, where that data should actually live. So, you don’t need to pay Oracle or Teradata tremendous amounts of money to store your data anymore. That is, of course, the most expensive storage you’re going to find.

You could instead choose object storage, like Ceph from RedHat, or there’s a company on the West Coast called MinIO, which creates S3-compatible object storage. And that’s very inexpensive, relatively speaking. And you can deploy all of your data, or start to migrate your data into this lower-cost storage, and still be able to access it, while your end-users are none the wiser to where the data lives – they’re just getting their results. So, I think that’s where you kind of get to create this optionality and be flexible about where you put your data over time.

Mike: In addition to the technical level, I always think about optionality as, does the open-source license itself also lead, or open-source infrastructure in general, also lead to more optionality and freedom?

Justin: For sure. I mean, I think the notion of not having vendor lock-in is really important to customers. Increasingly so, I think they’ve been burned over decades of very expensive technology that becomes legacy technology, and then, their stock and the pricing goes up. And they don’t feel like they have much ability to resolve that. And I think the open-source license in and of itself gives customers a lot of comfort, in knowing that, you know, a worst-case scenario, they can always roll this themselves, with the open source. But also, Presto is able to read open data formats, which is also great. Because I think data lock-in is probably the worst kind of vendor lock-in.

And in a traditional database system, once the data is loaded into the database, it’s kind of not easy to get access to or get the data out, without continuing to pay for that database system. But if you’re using open data formats, which we’d really pioneered during the Hadoop era, these are like ORC or Parquet, if you’re familiar with those file formats, you can store them anywhere and query them with a multitude of tools. You could use Spark to train machine-learning models, working off the same Parquet files that you’re querying via SQL for Presto. And I think that gives customers a lot of flexibility as well.

Open Source V. Commercial Market Size

Mike: I read a lot of articles about how enterprises are really moving towards open source, certainly when you look at the large consumer-facing services, like you mentioned, Netflix, Facebook, etc., they’re building a lot on open source. Then, you look at the size of the market, and you see that, actually, from a market percentage of open-source software is still only a tiny amount – is the move to open source really real, or is it more hype than reality?

Justin: When you say the market is small, do you mean measured in dollars, or what’s the metric there?

Mike: Dollar, yes.

Justin: Yep, makes sense. And that’s the key piece. I think it’s probably super widely used, but the percentage of open source that actually gets monetized is relatively small. And I think that’s what’s translating to the overall dollar amount, seeming small, relative, to the proprietary solution. I think if you measured in terms of impact to businesses and organizations, I think it’s actually probably the reverse actually, where you might have more open-source software having bigger impact than the proprietary.

But, of course, the challenge – and I suppose this is the purpose of your podcast – is figuring out how to monetize that effectively, so that you can build a successful business, while having that broad impact that open source provides. And I do think that, as vendors, we’ve gotten smarter over the years about how to do that.

I mean, the way I think about open-source business models over history is that it started with the sort of pure-play support model, just offering support, nothing proprietary. I think kind of Generation 2 was the open-core model that we’ve spent time talking about. You know, Cloudera popularized that, as did many other companies. And I think Generation 3, which is actually where we’re moving as well as a company, is cloud-hosted, SaaS offerings.

And, basically, being able to make part of the value proposition, the simplicity of the solution that you can deliver as a SaaS, and I think data bricks is a great example of that. So, I think that’s kind of the next frontier. And I think, as more and more open-source companies move in that direction, I think they’ll probably have better success in monetizing that background usage of the open-source. Because, there’s so much you can control now from a SaaS perspective to really enhance the experience, that is just easier for customers to use your SaaS solution, rather than having to maintain it themselves.

Starburst Cloud Strategy

Mike: I normally ask companies if they’re developing a SaaS offering. And I think that there are some companies where it’s been really successful like MongoDB, Eli Horowitz from MongoDB is emphatic that cloud is the best business model and everyone should be doing cloud. In doing the 50+ podcast, I found that the results have been mixed, where sometimes companies find that it’s a good way to reduce the try by fly time, where the cloud offering is a good introduction, but then the revenues are mostly derived from the enterprise, like self-hosted version.

And it takes a lot of effort to actually — it’s almost like a whole new product, like you’re building a software platform, a great software platform, and then, building the SaaS is almost like a totally new product in different business endeavor. What’s Presto done in this area? Are you working on it? And do you have any thoughts about how that experience is going, sort of making a cloud offering out of the software?

Justin: We definitely are working on it, and we have been actually for quite some time. And it is hard work. I think there’s no doubt about it, but I do think that some recent innovations around Kubernetes actually make this easier than it maybe was a few years ago. Because Kubernetes can kind of create a uniform, almost like operating system, if you will, that you can deploy your software within, and therefore, sort of create the software once, rather than having to have all these different kind of custom versions for different types of deployments.

I think that’s a game-changer. It’s certainly something we’re betting heavily on, as we approached that by trying to create the same experience, regardless of where customers deploy.

Single-Tenant V. Multi-Tenant SaaS

Mike: Most of the old cloud services were multi-tenant, but, are you thinking, like with Kubernetes, we could maybe build a single-tenant and deliver sort of like, “We’ll host it for you, you’ll host it.”, but it’s going to be sort of the same thing?

Justin: That’s exactly right, yeah. You know, I don’t want to give away too much of our strategy just yet because we haven’t released the cloud product yet. But I think those are really important concepts that you highlighted there, that we’re very interested in.

Building A Sales Team

Mike: So, something you must have done a really good job at is building the sales organization, because $10M in sales hasn’t happened by accident. And I think sometimes founders underestimate how difficult it is to build a sales and marketing organization – did you have any thoughts or advice you could share on, like how that went for you, like, how you pulled it off, like how do you do it?

Justin: Yeah. I think the first step I would say is trying to understand yourself as the entrepreneur – what the sales process looks like, like, what are customers buying, how do they understand the value proposition. And I’m a big believer in entrepreneurs selling the first few customers themselves. I think you learn so much, even from a product management perspective of what you need. You get to experience what your sales reps will experience when you start to scale up. So, I’m a huge advocate of that.

The second thing I would say is find a great sales leader. Because you know there are folks out there who have done this many times before, and know what it takes to sort of scale up a sales organization. And, certainly, that was impactful for us in finding our VP of sales, who’s done a great job of really scaling up that organization quickly.

Team

Mike: One question I had was, the pandemic has changed things were much more remote –  were you remote before the pandemic, and what’s your plan for growing the team in the next couple of years?

Justin: We were not entirely remote, but we did have some level of distributed nature to our team. Before the pandemic, we had major teams in Boston, the Bay Area, and then, actually Warsaw Poland as well, as an important development center for us. So, we kind of had to work across these three geographies, which are obviously spread out by 9 hours of time zones. And I think that gave us maybe a head start on the pandemic. But to be perfectly frank, I mean, I would much rather go back to actually having an office, and being able to interact on a one-on-one basis personally, with so many of these people.

Because I think what’s been weird for us is, we have scaled so quickly this year that I have not met probably half of our employees at this point, which is just a weird thing, to have grown the company so much. And the only interactions I’ve had have been over a Zoom call. So, that part I miss. I do think we’re all trying to make the best out of it, of course. And I think good best practices are sort of documenting everything, having frequent all-hands meetings, where you get everybody together, but there’s still no real substitute I think for one-on-one interaction.

Founder Advice

Mike: The last question, any advice for new entrepreneurs who are launching a business, and they want to use open-source software development as part of their business strategy?

Justin: My advice would be to think early about that key question that you asked earlier in the podcast about what your monetization strategy is going to be, and on along what metrics are you going to, or what criteria I should say, are you going to be separating the enterprise value proposition from what you give for free, and I think kind of have a strategy early on and stick to it. Because I think that will just make the decision-making process so much easier for you as you go along. You won’t have to debate each and every feature that you come up with – you’ll just sort of know because it will fall into a framework. That would be my piece of advice.

Close

Mike: Justin, thank you so much for sharing all this knowledge and experience with us.

Justin: Thank you, Mike. This was fun, and it was great meeting you.

Mike: Thanks to the Starburst team for reaching out and coordinating the podcast. Audio editing by Ines Cetenji, transcription and episode website by Marina Andjelkovic. Cool graphics by Kemal Bhattacharjee. Music from Broke For Free, Chris Zabriskie and Lee Rosevere.

Next time, we’re joined by Miguel Valdes Faura, CEO and Co-Founder of Bonitasoft, a global provider of BPM, low-code, and digital transformation solutions.

Until next time, stay safe, and thanks for listening.

Episode 53: Rajoshi Ghosh, Co-Founder of Hasura, Controlling Access to Data with GraphQL

Intro

Mike: Hello and welcome to Open Source Underdogs. I’m your host, Mike Schwartz, and this is episode 53, with Rajoshi Ghosh, co-founder of Hasura, a relatively young startup, using GraphQL to connect Enterprise data. I’m happy to report that Rajoshi is the first Indian national we’ve had as a guest on the podcast, a trend which I hope continues. Although she’s normally based in the Bay Area, Rajoshi was in Bangalore in late July when we recorded this.

Before we get started, I have a quick request – we all want to help open-source founders and startups. I make the podcast, but I need your help to get the word out, so tell your friends, post on LinkedIn, tweet out a link, post on Hacker News, or follow me and share one of my posts on LinkedIn, whatever you think makes sense, go for it.

As I mentioned, Hasura is a young startup, but given their success in momentum, I thought it’d be interesting to hear their stories sooner than normal. If you’re interested in GraphQL you might also want to listen to episode 41, with Geoff Schmidt, the founder and CEO of Apollo.

So, without further ado, here is Rajoshi Ghosh, co-founder of Hasura. Welcome to the podcast. Thank you for joining us today.

Rajoshi: Thank you so much for having me, Mike. I’m really happy to be here.

Origin

Mike: What’s the origin story of Hasura?

Rajoshi: At that time, Michael, we started working together really long time back, and I think at that point, we really wanted to work on something that would make application development easier, but we did not know what shape or form this would take,  and so, what we started doing was, we started building products for — started off with friends, and then, moved on to like local startups and then started working with one of the largest banks out there, help consulting with them.

We realized that if we were building enough products across different types of companies and different industries, then we would learn a lot about the different businesses, and also, it gives us an opportunity to try and build tooling that can work across companies of different sizes and different industries and verticals.

So,we got into this consulting business that we did, but the entire time we had a small team that was continuously building different products that could be used across different clients that we were working with. So, we did that for about three and a half years, and then, it came to a point, where we built a bunch of these tools, and you know, we were faced with the decision of signing like a multi-year contract for a consulting gig with a really large bank or actually going back to saying, “Okay, let’s take these products to market and build a business out of it.

So, we decided to start Hasura, wind down the consulting practice. So, that’s kind of how we set up the tools, that, like, parts of Hasura were built. So, what we did after that was, we spent a few months taking these different pieces that we built to market, trying to open-source a bunch of them, saw how folks were reacting, trying to understand the business implications of these products being used.

And that’s kind of how the Hasura GraphQL engine, which is our open-source product sort of came about. You know, we spoke to a bunch of people, realized that data access was a big problem that people seem to be struggling with across all kinds of companies, and again, different sizes of businesses. So, that was the piece that when we open-sourced and we wrote about it, and we put it out, and I think the first blog post that we had about our product resulted in our Fortune 500 health care company, reaching out to us and saying, “Hey, we really want this.” So, we knew we were onto something. So, it started out with this consulting practice, building pieces of this data access problem, from there, and then kind of polishing it up and launching it as the Hasura GraphqQL Engine in 2018.

How Is Hasura Different From Apollo?

Mike: A few episodes back, we had Geoff Schmidt, one of the founders of Apollo GraphQL, and although this is a business podcast, not a tech podcast, we have a lot of tech listeners out there. So, maybe you could just give a quick overview. I know that Apollo is part of the community, you’re part of the community, and the products are different, but maybe you could just give a quick overview of like how the products fit in the market?

Rajoshi: Absolutely. At Hasura, we kind of came at this problem, from the data access angle, we were trying to solve the data access problem, and back in our consulting days, we have actually built our own version of GraphQL called Urql, and we had that whole thing going. And every time we would talk to people about what we were building – and this was around the time GraphQL was getting popular, people would tell us, “Hey, this sounds an awful like GraphQL, why don’t you add support for GraphQL?”

So, that’s how we sort of braided into the GraphQL space, but we sort of came at it from the data access piece, so, what Hasura as a product does today is the service that you connect to your database and other data sources, and it kind of instantly gives you GraphQL APIs. So, it’s sort of short-circuits, the path, like you don’t really need to build a GraphQL server, Hasura kind of becomes that piece in the middle that connects to your database and other services, and gives you these APIs. And Hasura gives you a metadata engine, where you can specify the relationships between your different pieces of data – you can add authorization, logic, we have a very granular authorization system built in. And then, you can start accessing these APIs directly from your front-end clients.

That’s how we fit into the GraphQL ecosystem. And now, we have our – Hasura is available as a cloud product, and what you also have is, you have a lot of features that help you kind of run Hasura production. And there’s a little bit of overlap with some of the things that Apollo engine does, which is basically, like monitoring and analytics, and sort of add that API layer. So, these are the features that we have in common, but the problems that we’re solving are very different.

I think Apollo is coming at it from the side of being, like a GraphQL Gateway, where every different service speaks GraphQL, and they’re kind of the GraphQL Gateway. And they are building tooling at that GraphQL Gateway sort of layer, where, we are sort of more on the infrastructure layer, where we are solving the data access problem. And we give you GraphQL APIs.

Lessons Learned

Mike: If you could go back to that day when you said, “Okay, we don’t want to take this contract, we want to move forward with a software company.” That must have been a little while back, but if you could go back to that day, would you do anything differently in terms of how you executed after that?

Rajoshi: That’s a very interesting question. I would say not. Because what happened, like, the steps that sort of followed from there, we took the product, we had built some great text, so, we raised some seed money based on some of the tech built, we took that to market.

And once we put out the GraphQL engine on like, we did a show and launch like a Hacker News launch. And that was a pretty good launch that a lot of people found out about the product, and usually what happens is with Hacker News launches, they are very transient. Somebody finds out about it on one day, sometimes it goes great, but sometimes, there are new products being launched that every single day.

But I think what helped us sort of stick around after that initial launch was the fact that when people started trying it out and looking at our documentation, there were two things that I think really helped us over there. One was that either documentation was very complete. This was also because this was a problem that we’ve been at for a while.

And the second thing was that getting started experience was magical, like it was 30 seconds to your first GraphQL API. You would connect it to an existing database, existing Postgres database. And you would instantly get APIs. So, that sort of helped us get the word around, got people excited. And they spoke about it to each other, and that started off our sort of developer adoption journey.

So, we were still tagged Alpha back then, and we already saw all kinds of companies starting to use Hasura, and putting it in a very critical part of their stack. So, what happened is, we hadn’t actually started building a commercial product just then, we just put this out and we were still trying it out. But because of the kinds of companies who’ve started using us, they started calling us inside, saying, “Hey, you know what, we’re using this — if this goes down like who are we going to talk to, like, can we sign a contract with you. And then, he started getting these calls from companies, and that also helped us sort of like inform our kind of roadmap of how we were going to build into our Enterprise product. And then, we launched that earlier this year.

I think the journey has been one of learning, and on all sides of things, like growing an open-source community, growing the usage of an open-source software, and then building a commercial product around it – that’s been a really good journey. And I think the fact that people really, like, the commercial versions of the product as well is something that comes from having been through the journey with our users, listening to our customers and working alongside them over the last two years.

Monetization Strategy

Mike: So, that pivot that you’re describing is really difficult to make from open-source project to commercial product, and you’re probably still making that pivot as we talk, but can you talk about what are your thoughts about the strategy for whether – I heard you mention that you launched a cloud service – that’s certainly a fantastic business model. There’s also a lot of innovation around open core, making certain parts of your product, commercial vs. open source. So, how have you figured out how to monetize some of the open source to fund the company?

Rajoshi: I think the way we’ve been thinking about what comes apart of sort of our commercial offerings is, things that companies using GraphQL engine in production will start needing, you know, when they are in production. Things like monitoring and analytics, stuff like query capture, so that you’re able to create allow lists when you’re in production, prevent breaking changes with regression testing, great limiting of your queries – these are kinds of features that people need when they’re in production. So, these are the kinds of things that we put in our commercial versions.

And the core GraphQL engine, which sort of gets you building and then, you need to self-manage, and you need to build all of these toolings yourself, but the call that sort of gives you the APIs and helps you connect to data systems – that’s in the open-source part. Because that’s part of your critical infrastructure, and you know, being an open-source product, if you’re in the infrastructure I think is almost given these days.

Cloud Positioning

Mike: What’s the positioning of the cloud product? I understand you just launched that, but what is your vision for? Is that going to be a major part of the revenue streams, or is it just from so people can get started and kick the tires quickly – where do you think that fits?

Rajoshi: It’s very new. We just actually announced a general availability, we launched it about 4 weeks ago. So, it’s very, very new, but we do foresee it being a critical part of our like revenue stream going forward. So, what we did is, we actually re-engineered a sort of open-source engine for it to be like a true cloud SaaS product. Both of the things you mentioned are important in the sense that getting started with Hasura on cloud is something that we absolutely care about, that being the best experience for you to get started on Hasura, so that’s extremely critical to us that anybody who wants to try Hasura, the experience of getting started on cloud should be magical.

So, that’s very important, but it’s also something that we’re building for, you know, companies running Hasura at significant amounts of traffic and scale introduction. We have interesting sort of things that we’ve built into the cloud, and one of those is sort of like dynamic data caching. And that’s something that we’ve — I know that the podcast is going to be out about three weeks after we speak, but actually in just about an hour, my co-founder’s going to be speaking about server-side caching and dynamic data caching as part of the GraphQL summit, where he’s going to talk about how we’ve built it, and what is our sort of vision of the cloud, which is something like CDN for data. So, that’s kind of where we see it going, where you build, you connect your data sources.

Data is being fragmented, and data is everywhere – it’s in your databases, it’s a multiple data sources, and you have this managed infrastructure piece in the middle that just has to connect to all of these data sources, magically get this API. And it’s fully managed, it scales, it’s super fast. And so, yeah, that’s kind of the way we look at the cloud product.

Value Prop

Mike: You mentioned that Hasura is almost like a data connection layer. One of the main value propositions must be getting access to your data, but what are some of the other reasons that driving Enterprise customers in particular.

Rajoshi: I think for Enterprises specifically, since that was your question, I think one of the things that we’re seeing and that our Enterprise customers are seeing is that time to value and time to market. So, as people like building with Hasura, and we recently had our user conference, where, Philips healthcare, one of the Solutions architect there, who’s been a Solutions architect for 26 years, spoke about how something that they were building with Hasura would have typically taken them two to four years, but build and ship this product in under a year.

So, companies and Enterprises are saving like quarters and like years of work by using Hasura, because Hasura is just – you get these APIs with access control out of the box. This could be anywhere from like 40 to 90% of the backend logic that you need to write. So, that’s a huge benefit. And that’s kind of what is also helping Hasura spread by word of mouth within the Enterprise. Once one team starts using it, other teams sort of see the pace at which people are building, and that’s helping the word spread within Enterprise for us.

So, that’s one. The second – there’s two other things that, again, we’ve heard our users talk about. One is having better domain understanding. There is this layer that connects to all of your different data sources. You sort of have a better understanding of your domains. And that helps architects and that helps team lead sort of build faster, because as things are very fragmented, Hasura kind of brings that unified view of your different access control rules across different data sources. That also adds to the speed aspect that I spoke about, but that’s also in itself with huge benefit that enterprises are seeing today with Hasura.

And the third one is enhanced security, and again, each of these points of sort of building on the previous thing that I spoke about, which is, we have the Hasura console, which is this UI, where you can see all of these different access control rules. And so, you’re able to see very granular access control rules are set up for, again, all of the kinds of data access that you need to do. So, having that visibility in one UI makes it very easy for again Enterprises to handle the security aspect of data access. These are things that we’ve heard time and again from our Enterprise users as benefits that they see building with Hasura.

And on top of this is the entire GraphQL piece, which is all of the benefits of GraphQL that is making people move towards GraphqQL, the amazing developer ecosystem around the tooling, the developer experience for front-end developers to get started and build products fast. So, the front-end experience with GraphQL along with the themes experience with Hasura to handle all of the different data sources in the access control kind of makes that package really valuable for our users, especially in Enterprise.

Community

Mike: You mentioned the user conference, and I’m wondering what is the current community like for Hasura, and how do you see developing the community, and giving the community enough value going forward?

Rajoshi: We have a really, really engaged community around Hasura, and it’s something that we’re all very, very excited about. And it really makes us very happy, and we deeply care about the community around Hasura. So, more than half of our engineering team is working on our open-source product. There’s continuous development, and we’re listening to our users in the community very closely and sort of acting on things that they are talking about. So, there’s that aspect to it.

And there’s also the aspect of like really helping the learning process. So, we have a very extensive set of tutorials on GraphQL on Hasura, on authorization that we’ve built on hasura.io/learn, which basically the GraphQL tutorials over there are like vendor agnostic tutorials, which are just about getting started with Hasura, whatever sort of stack your come from, especially for front-end developer. So, I think we have almost more than 10 tutorials for different stacks for you to get started with GraphQL.

And overall, the site has I think almost like 15 to 17 tutorials on like full stack, front-end, back-end authorization data modeling. So, these are things that we put out continuously for the open-source community. We have a very vibrant discord community, and we have community champions, who are folks, who are helping out each other and helping out other users who are coming into the Hasura, new folks who are coming into the Hasura community, so that’s where the community hangs out.

And yeah, I think bringing all of these aspects together at the user conference was truly amazing for us two years into launching Hasura that we put out this user conference, it had about 33 talks of which three were from Hasura folks, and the other 30 were from the community, just talking about different ways, in which they are using Hasura different pieces of tooling that they’ve built. So, it was really, really good to hear and good to see the community coming together, giving back, and by talking about how they’ve learned and build things with Hasura.

Community Code Contribution?

Mike: Does the community actually commit any code?

Rajoshi: We do have community contribution that’s happening on console code based, on documentation, on sort of lots of tooling, but yes that is community contribution going into the code records.

Pricing

Mike: Pricing is one of the hardest exercises, not only for open-source startups, but I think for every company. What are some of the gates that you’re thinking about for pricing, and how do you get, for example, intrinsic value based pricing for the customer?

Rajoshi: Yeah, this is something that we’re thinking deeply about and also evolving. We are very, very early in the stage, in this journey. I’m sure, down the line, that we add a lot more color that I can add to this conversation to this topic, but right now, we’re thinking about it as basically pricing on the cloud product, consumption-based pricing, which is basically on data pass-through.

We have a starting tier with certain amount of data pass-through, and then, we’re basically charging on data pass-through. And we have a few more things on, the number of collaborators and the limits we apply on some of the features that you get. But the primary way that we’re thinking about pricing is on the data pass-through.

So, that’s currently how we price on the cloud product, and for Enterprises, which either on the cloud or on, a self-hosted model currently, its feature bundles and pricing per project, which is unlimited usage but pricing for project – that’s how we roll pricing on the Enterprise, but on Cloud, it’s with these usage limits. And that scales as your usage of the product scales, and each of these is for personal project.

Serving SMB?

Mike: So, a lot of software startups, some in your area, are making most of their money in the Enterprise space. I bet you think that you’ll find a way to also offer products and services to smaller organizations.

Rajoshi: I think. I mean, today our users do span the spectrum from the hobby developers to folks in the enterprise, we definitely – our Cloud product is meant to be something that caters to smaller products that are just kind of getting started in introduction. And there’s always open source that you can sort of sell, post and run across any kind of hosting service that is – yes, but the cloud is something that we have envisioned to be anyone running GraphQL introduction should be able to start a pricing would make sense for them economically. And that’s sort of how we’re thinking about it. And like I said, it’s very early days for us. And so, we’re going to sort of observe and see how it scales for us over the next few months and keep tweaking this.

Team

Mike: Right now, you’re sitting in Bangalore, one of the urban hubs of technology, what are your thoughts about growing the team, and how you’ll have to adjust the plan for the pandemic?

Rajoshi: We started, becoming a distributed company early in 2019, I think 2019 was when we brought our first remote colleague on board. And since then, we’ve been hiring across the globe remotely.So, that was very helpful to US during Covid, like all of our communication and all of our working in a distributed manner across different time zones. So, that really helped us when we all went with Covid-19.

So, in terms of growing the team, I think we will continue to hire across the globe in different cities and different countries –  that’s how we’re thinking of growing the team. We do have a base in Bangalore, and we have a base in San Francisco, so, we will look at hiring across these two cities. But we, also, currently, we have people in, I think, over 20 cities outside of these two, the side of Bangalore and San Francisco, maybe even more – I haven’t actually done a proper count, but we have everybody from, like LA to Melbourne, and like everything in middle.

So, we can’t actually do an all-hand’s call, where we have everybody in the same call, because we have like the West Coast, we have Europe, we have India, southeast Asia, and Australia. So, we kind of do this call in the morning, you know, morning time SS. And then, we record that, and we do one in the evening, like late evening, which works for like Australia, Asia and Europe. And this kind of play the recording, and then do another second half, and then, record that, and play that in the next thing. So, we figured this out, and it’s working pretty well.

But we’re already across time zones, where we can’t fit everybody into one call that is not a crazy hour. So, I expect us to kind of keep trucking along that way. And it’s fun! I mean, it’s great to have colleagues from like all over the globe, and we try to bring everybody together twice a year. That’s surprisingly enough. We actually had one of those this year, which sounds magical and unbelievable almost, that we actually had a team off site in 2020. But we did that just before Covid struck, and now, we’re all waiting for that to happen again.

Advice For Open Source Software Startups

Mike: It’s really hard to start any kind of business, but open-sourcing your software adds a little extra challenge I think. Do you have any advice for founders on how to find the right balance to make open source an advantage in their business model?

Rajoshi: Yeah. I think open source is a very important question to ask, if you are sort of just starting off and the decision is between, like, should I open source or not. I think why is open source important, it’s that specific kind of business is for important question to ask. I mean, there are every kind of products that are open-source and closed-source alternatives.

In the infrastructure space, open source has pretty much become table stakes for people, for how software is being adopted, but I think thinking through the business model from the beginning, and not making that an afterthought is going to be very important. That would be my only piece of advice to sort of think about it from day one, at least as much as you possibly can.

Because there are two ways – either you start a business intentionally, saying this is going to be an open-source business. And then, I’m going to start clearing on commercial features. Or you build something open source as a side project, and that kind of takes off, and then, you try to figure out, “Oh, wow, everyone’s using this – how can I make a business out of it?” I guess if that’s the way you’re going about it, then, you will have to figure it out as it happens. But if you’re intentionally doing it, I think thinking through, really thinking through how will you start monetizing, right from day one, is going to be very important. Because, often times, if the differentiation factor between what is open source and what you plan to make a part of the commercial feature, if it’s not very well-thought out, then, that can lead to all kinds of problems, both from the community, as well as just generally as to become a viable business.

Did Hasura Plan The Business Model Early On?

Mike: Did you follow your own advice there?

Rajoshi: Yes, we did, we did. I think we did. Partially we did, definitely, we know that we didn’t want to make our commercial offering. Their support base model wasn’t what we were going for – that’s not the kind of open source company we were building out to be. And we also did not want to have our Cloud version to be just like a hosted offering. Because the problem, the way we see it, if it’s a hosted offering of your open-source software, then, everyone’s bound to compare it to here. But I can host it on so-and-so provider for like this much cheaper.

We did not want to go down that path, we wanted to really offer teachers, which were part of our commercial offering that would really, again, make sense for the stage of the company you were. And, it would economically and ergonomically make sense.

So, it was always about that – what are the things that you will need once you’re in production, you know, you birth the product, you trust the product, and now, you want to go ahead in production. And what are the things you don’t want to worry about when you are in production. And that’s kind of what we look. The jury is still out – I mean, well, we’re going to see how this works out, but so far, the signs are good. I think people are really liking our commercial features and the product, but it’s early days – we will see how things evolve.

Role Models For Rajoshi?

Mike: You’re the first Indian female founder we’ve had on the podcast, and we hope you won’t be the last. I’d like to end with this different question than I normally ask. Who are some of the leaders and role models that influence your decision to co-found Hasura?

Rajoshi: Wow…that influence my decision to co-found Hasura? That is a very difficult question to answer because I used to be in genomics and research – I’m a bioinformatics person by sort of education, in the first few years of my career. And if somebody had asked me then, “Are you going to be the co-founder?” Like, whatever, start a company – I think it would have not crossed my mind. I was deep in the academic world, and it was so far away from anything that I thought about that, I don’t think I would have had an answer.

When I started working at this Incubator is when I saw how startups work. You know, I found about startups, this incubator that I work. I used tohave a lot of folks who worked at successful companies, who would come and speak to the students, though I was a mentor there, I was also a student, because I was teaching them programming and learning about all of these different amazing business things from folks that had successful startups. And that was the first time that thought crossed my mind.

I think my journey – once I did that, that was an 11-month thing, where I thought – I think for me after that, it just seemed like the next step that I have today. And that’s kind of how I got into it. It wasn’t again like an extremely like, “Okay. I am going to start a company.” sort of thing, it’s sort of just like my life experience has led me to this. So, I guess I don’t know how I can sort of talk about role models and who kind of influenced my decision – I think that experience that I had teaching at this Incubator – which is actually based in West Africa in Ghana, and it’s done by a Silicon Valley company called Meltwater. It’s an amazing program, and they bring students or fresh graduates from university, who want to start companies, and train them. And just being part of that ecosystem, I think that was my inspiration, that entire experience was my inspiration to sort of stop something and not get bogged down by, you know, it’s just going to be really impossible to do.

Closing

Mike: Oh, congratulations. I think you’re going to be the role model for the next generation of entrepreneurs going forward. So, best of luck with Hasura and everything else you do. And thank you so much for being on the podcast.

Rajoshi: Thank you so much, Mike. Thank you so much for having me.

Mike: Thanks to the Hasura team for help coordinating the podcast. Audio editing by Ines Cetenji, transcription and episode website by Marina Andjelkovic, cool graphics from Kemal Bhattacharjee. Music from Broke for Free, Chris Zabriskie and Lee Rosevere.

Next time, we have Justin Borgman, CEO of Starburst. If you don’t know Starburst, I highly recommend listening because it’s an amazing story of a perfectly executed startup – Justin was great.

Until next time, stay safe, and thanks for listening.



Episode 50: DataStax NoSQL solutions built on Apache Cassandra with Kathryn Erickson, Open Source and Ecosystem Strategy

Intro


Mike Schwartz: Hello and welcome to Open Source Underdogs. I’m your host, Mike Schwartz, and this is episode 50 with Kathryn Erickson who helps lead open-source strategy at DataStax. Founded in 2010 and currently employing about 500 people, DataStax was one of the first and most successful companies in the Apache Cassandra big data Ecosystem.


Kathryn has an engineering background. You can listen to some of her great deep dives into the tech on the DataStax website. In her role on the strategy team, she’s helping to lead the company into its next phase of growth and community engagement. I hope you’ll enjoy this episode. And if you do, don’t forget to share a link on social media. You can find all the episodes on opensourceunderdogs.com, or you can retweet our announcement by following us on Twitter. Our handle is @fosspodcast. So, without further ado, let’s carry on with the interview.

DataStax Origin

Mike Schwartz: Kathryn, thank you for joining us today.

Kathryn Erickson: Sure, of course, thank you.

Mike Schwartz: Most of our listeners probably know about Apache Cassandra, one of the most popular databases for big data, but how did DataStax evolved in relation to the Cassandra project.

Kathryn Erickson: DataStax was founded by Jonathan Ellis and Matt Pfeil, both employees of Rackspace. Jonathan, being contributor to Apache Cassandra and Project Share as well, was considering leaving Rackspace, and Matt Pfeil went to talk to him and say, “Hey, there’s some really cool stuff going on here, you should really consider staying.” And by the end of the conversation, they were founding a company together.

And so DataStax was founded to support Apache Cassandra. Over time, we began adding Enterprise features and selling an Enterprise distribution of the database with these features added, and then, of course, more recently, the cloud platform as a service offering as well.

Evolution Of Support Offering

Mike Schwartz: Actually, I didn’t realize that you started out providing support. Because when I first ran into DataStax, I guess I had just known it as a distribution of Cassandra. And now, I see that you’re also providing support for the open-source distribution. Can you talk a little bit about how that’s evolved over time? Has it always been there or has there been a focus on for or against doing that?

Kathryn Erickson: It hasn’t always been there. When DataStax was founded 10 years ago, there wasn’t really a playbook for how to build and run a successful open-source company.
We were founded around the premise of providing support and consulting for Apache Cassandra. Over time, we did, all for the Enterprise Edition, but what you see with most Enterprises is that they have a mix of the Enterprise version and open source. For some customers, that’s dependent on the criticality of the data, and for other customers, it’s dependent on the features or the distribution, being the as-a-service offering or self-installed on-prem.

And so, what we saw in the last year was that there were some obvious things that we weren’t doing, and our customers needed support and consulting around open-source Cassandra. We are beginning to open-source a lot more of the features that would build Cassandra abundance, and so, it made sense to bring those offerings back.

Astra – DataStax Cloud Offering

Mike Schwartz: Okay, and you mentioned that DataStax launched a new hosted service called Astra. Do you see that product as a driver for revenue, or is it just an easier path for customers to test drive the product?

Kathryn Erickson: I think that will evolve over time. I think at launch, it is the easiest way to learn Apache Cassandra. And I think as we launched the hybrid option, I believe that’s later this year, that would become a more significant line of revenue.

Pricing

Mike Schwartz: Most of the revenue today I guess is from the license Enterprise product, so focusing on that, a lot of open-source businesses are moving towards consumption-based pricing. And I’m wondering, what kind of metrics do you use to determine what is consumption?

Kathryn Erickson: You know, a cloud-based offering consumption is based on capacity. And with our licensed product and with Luna, the open-source support offering, our focus this year has been around simplification of the pricing model. And we revisit that each year.

With the Enterprise product, we previously charged for the Enterprise license, and then, an optional additional fee for advanced workloads, like Spark analytics and graph. That’s confusing for the customer, they just want a simple pricing mechanism. So, we collapse that pricing. And then, of course, for larger deals ,we would have ELAs, or special terms to accommodate those customers.


Mike Schwartz: That consumption is based on, like, per CPU, per server, or how do you actually figure out what is the size?

Kathryn Erickson: It’s true capacity-based, the size of the data set being stored. And as we move to Astra hybrid, which will be that offering on-prem, I think we’ll consider that pricing option there as well.

Market Segmentation

Mike Schwartz: Data persistence is like the most horizontal market on the planet. Every company basically needs to store data. When you can sell to everyone, it’s sort of a blessing and a curse. Do you segment the market at all vertically or by use case, or do you just not segment the market?


Kathryn Erickson: It’s hard to segment when you’re serving a pretty broad market. What we try to do is have as easy of an on-ramp for the different verticals as possible. We see data models look similar between IoT use cases, inventory and messaging data models would be similar.
So, we don’t segment the market for go-to-market strategies, but we try to find places of repeatable consulting efforts to speed up the successes for those customers.

Partnerships

Mike Schwartz: When you took on the role of director of strategic Pprtnerships, you probably did a survey of the range of partnerships that exist. Can you talk about like what is the partner landscape look like at DataStax?

Kathryn Erickson: I ran our technology partner program, and there’s two other sides of that, SI partners and the cloud partners. On the technology side, you want to make it easy as possible for customers to consume your product.

So, in a technology partner program, you want to understand the user journey to get to your product, and make sure that those adjacent technologies have the simplest most repeatable easy to build, easy to test integrations as possible over time. If you want to think about specific companies and integrations, every database needs an ODBC and JDBC connector. And customers want those for BI, for reporting, for simple ways to move data in and out of the system, but in the last few years, most customers also want to see Kafka connectors and more high-speed ingest Pub/Sub integrations.  So, we want to accommodate those as well.

Mike Schwartz: Coming on the System Integrator side, you know, at Gluu, we found that those have been essential for us, to be able to focus on innovating the product versus getting involved in specific projects. But there’s such a broad range when you’re serving a global market of the System Integrators. Do you consider them channel partners or integration partners?


Kathryn Erickson: We usually consider them strategic partners when we take those types of partnerships on. And the goal is usually to help us penetrate markets that we don’t currently have field team in, or packaged, or cookie-cutter solutions. If you look at some of the stuff that we’ve done with VMware and with partnerships at Dell, we want to assert that the product stack works as recommended for customers that are used to seeing these reference architectures from these larger integrators and technology companies.

Most Important Partnerships For Driving Revenues

Mike Schwartz:  Which partnerships, do you think are the most important for actually driving growth?

Kathryn Erickson:  Deloitte’s been in a role to our federal business, they know that space better than any startup could hope. VMware for helping to modernize Enterprise platforms. Enterprises that are looking at Cassandra and looking at DataStax are usually going through some type of digital transformation. And the product that they already have in place is VMware. So, everything that we could do to make that migration to know SQL smooth was helpful to those customers. VMware has been a pretty big partner in my journey.

Open Source Strategy

Mike Schwartz: Some of the companies we’ve interviewed are moving to a 100% open-source strategy, specifically Chef and Cloudera. In the past, the value property DataStax, it had improved distribution of Cassandra.But do you see DataStax maybe moving more in the direction of open-sourcing its platforms and some of that technology it’s developed?

Kathryn Erickson: We are open-sourcing a lot more. We try to stick to simple rules for open sourcing, simple rule is, it’s a Harvard Business review article, simple rules for a complex world.
And so, simple rules for open source, if it increases adoption Cassandra, it should be open-sourced. And if it’s Enterprise feature that’s more specific to Enterprise customers, like security features or advanced replication options, then that would be kept proprietary.

And then, where should something be open-sourced? Well, if it makes a change to the core of Cassandra, of course it should go to the Apache project. And if it increases abundance, but it’s not impactful to the core of the project, then it still should be open-sourced, but maybe able to exist in a DataStax repo or different foundation.

Does Open Source Help?

Mike Schwartz: Do you think the wider open-source community A Cassandra helps DataStax too?

Kathryn Erickson: Of course, open source is all about positive sum games. I think it was Thomas Jefferson that said, “If use my light to light your torch, then we both have light.” And that’s how open-source works. The more communities and more companies that you can move from being other to being self, the larger the positive sum game that you’re playing. So, it’s open source, and open-source abundance is absolutely essential to the success of any open-source company.

Thoughts About Open Source Foundations?



Mike Schwartz: Any thoughts about Cassandra being hosted at the Apache Foundation versus perhaps Linux Foundation or the CMSF?

Kathryn Erickson:  I don’t have any opinions on the other foundations, but I think that Apache Cassandra will always be at home with the ASF. They have their simple rules for what it means to protect the open-source nature of a project, and they don’t waiver. And for a vendor backing an open-source project, that can be like a Northern Light, you can lose your way, and you can always look back up and reorient towards the community.

But you know, there’s nice things when you see CNCF, you know, the marketing wing, and the power of the CloudNative messaging that’s there. But there’s no reason that projects can’t have pieces that exist in different foundations either.

We see ourselves and others that build communities operators or management APIs or drivers is an example, they should live in a project, but management tooling that exists that the maintainers of the project wouldn’t want entry. So, something like that maybe should live in a CNCF type of foundation that’s focused on CloudNative. But no Apache Cassandra will remain Apache, and that’s a tome.

Industry Changes In The Last 10 Years

Mike Schwartz: So, DataStax is one of more mature, well-established companies in the open-source ecosystem today. What are some of the challenges you think that you are looking at now that were different than when you got started?

Kathryn Erickson: When I started a DataStax, it didn’t always feel like we had a lot of competition. And I think as other good distributed databases emerged, we adjusted to having competition. I think the obvious answer that most people would expect is pressure from the public Cloud vendors. But if you stay oriented on the positive sum nature of open source, then that becomes easy to embrace as well.

So, there’s changes in understanding the virtuous cycles of open-source, understanding how to build software as-a-service more quickly as Kubernetes has matured that’s become a lot easier. So, I think the ecosystem around us has matured a lot, the playbooks around how to build a company around open source have matured. And there are more senior projects that kind of exist in our ecosystem that we can work with and learn from as well.

Is Open Source Table Stakes For Databases?

Mike Schwartz: You know, most of the databases that have been released in the last, let’s say five to eight years or so, have been open source. Is being open source basically like table stakes now? So, is it a non-differentiator in the database market?

Kathryn Erickson: I think that if you’re moving from a proprietary relational system, and moving towards NoSQL, then you’re obviously moving into an open-source world. And if you can choose something that has a security life, security blanket that you know will outlive any vendor behind it, then you should consider those options first.

I think that it would be hard to start proprietary databases without the support of the community and of these foundations. I think Snowflake has done an exceptional job and is kind of the exception to the open-source game. But, you know, they were disruptive in a much different way. NoSQL in general is an open-source family.

Data Platform Trends

Mike Schwartz: Just a general database question about the database market. So, we’ve interviewed a probably more database companies on this podcast than any other type of company, but have you ever seen a real shift in the way that customers think about databases.

In the old days, I think you just used to get one database and hope it did everything, but have you seen a sort of on the technology side a shift in the way that companies are thinking about data and databases now, with more SaaS hosted offerings and more database offerings, like in general.

Kathryn Erickson: Yes. I think I think this is definitely the age of data platforms. With Cassandra, we see customers considering NoSQL when they’re using the relational system. And it can’t support the throughput that they need anymore, or they need to replicate more geographies, or exist in a multi-cloud or hybrid environment.

And so, that’s when you consider Cassandra. If you look at when you might consider Mongo, you want to get quick start with a developer friendly environment that’s great for mobile. What you start to see is that there’s a certain fit for purpose that the different NoSQL databases have. We’ve started to see an emergence of multi-model systems that move forward. And consolidating those capabilities, we have that with our Enterprise products and their integrations for graph analytics and search, we want to help customers build high-growth applications, high-speed transactional applications are the sweet spot of any Cassandra deployment.

Advice For Startup

Mike Schwartz: This is a question, a sort of a generic question for entrepreneurs who want to launch a business around an open-source product. I’m wondering if you have any advice, for let’s say, startups? And it could be general and it could be about partnerships.

Kathryn Erickson: You don’t have to invent a path to success, you can listen to the A16 podcast, you can look at other companies that are out there. You can go through so many success stories on podcasts like this, you can listen to Cockroach, and there are Open Source Underdogs podcast talk about how they’re thinking about licensing other companies. You know, having similar conversations, really understand what has made other companies successful, and don’t try to invent that yourself.

How To Improve Tech Diversity?


Mike Schwartz: Last question. As you’ve might noticed, there aren’t enough women in the tech business, including there haven’t been enough women on my podcast, so thank you for joining. What can we do to reverse that trend?

Kathryn Erickson: I think there’s a lot that we can do. as You are on the side of making mistakes, just try things, and if it’s not the right thing or if it doesn’t work, try something else. We’re going to do a program at DataStax, you know, Jumpstart, if you’re a woman or a person of color, and you want to learn Cassandra, and you don’t know where to start, just hit the button, sign up. Somebody from the team will meet with you for 30 minutes and help you get started. That might work, that might fall flat, but we’re going to just start trying stuff. And I think everyone should just start trying the ideas that they have, and we should all tell each other what’s working.

How’D You Get Started?

Mike Schwartz: How did you get started in the tech industry?

Kathryn Erickson: Well, my dad taught Computer Science, Community College, and I was going to be a DNA researcher. And I just wasn’t very good at it, and I thought, “You know what dad’s over Computer Science, we’ve been playing with computers all of our lives.” That sounds more like playing then working, it’s been that way ever since. It feels more like playing than working every day,

Mike Schwartz: That’s great. Thank you so much for joining us today, Kathryn, and sharing your insights. And best of luck at DataStax.

Kathryn Erickson: Sure. Thank you.

Closing

Mike Schwartz: Thanks to the DataStax PR team for helping us to schedule some time with Kathryn.

Editing by Ines Cetenji. Transcription by Marina Andjelkovic. Cool graphics by Kamal Bhattacharjee. Music from Broke For Free, Chris Zabriskie and Lee Rosevere.

Next episode we’re excited to have Cornelia Davis, author of Cloud Native Patterns, a Manning book that needs to be on every software architect’s bookshelf. She’s also the CTO of Weaveworks. She was fantastic, so don’t miss it. Until next time, thanks for listening, and stay safe.