Episode 13: Confluent – Apache Kafka Streaming Platform with Jay Kreps

Jay Kreps is the Co-founder and CEO of Confluent, an open source real-time data streaming platform powered by Apache Kafka. Jay is the author of numerous open source projects, including Apache Kafka and Apache Samza. In this episode, Jay discusses creating a hybrid open source offering and usage-based subscription models.



Michael Schwartz: Welcome to episode 13 of Open Source Underdogs, the podcast where we help open source software entrepreneurs metamorphize their business models.

Jay Kreps is a Co-Founder and CEO of Confluent, the company behind Apache Kafka, a real-time stream processing platform, currently used by 35% of the Fortune 500.

This episode is being recorded remotely because Jay is based in Silicon Valley. Jay, thank you so much for joining us today.

Jay Kreps: Thanks for having me.

Michael Schwartz: So it’s not every day that the company you’re working for agrees to let you start a company to work full-time on a project you started while working for them – how’d this come about?

Origins Of Confluent

Jay Kreps: Yeah, that’s a great question. There was actually a whole kind of evolution of the infrastructure stack at LinkedIn, where I had worked.

We rebuilt a lot of the database layers that served the kind of live site, and we had built custom stores, searching for people, searching for the infrastructure, and for social graph that shows how you’re connected to people.

It was kind of time in the world where all the infrastructure was moving from single-server databases and systems to these distributed systems that could scale horizontally, and that was incredibly important for a social network like LinkedIn, because it had to scale, and sometimes faster even than the number of users.

In some ways, it scaled almost with the square of the users because you have all their connections, interactions between them, and that kind of grows faster than just the people joining.

We were really trying to figure out how can you scale the social network, and how can you do it in a way that’s cost-efficient, and how can you not just make it able to handle the data size but actually make it something where you can have hundreds or thousands entrepreneurs that kind of work against these platforms productively, how can it all kind of interrelate.

And so a genesis of Kafka was really that environment, we had all these different stores, we had all these big aspirations for how we wanted to use what was happening kind of in real time and feeding back into the user experience, how we wanted to connect all these different systems, in some sense how it all played together.

And there was really no solution to that problem that we felt was in a well-thought-out, or could scale in that way.

So we really started to think about like, “hey, what was missing in this stack?” And we came to some interesting observations, like a lot of the traditional data processing technology is something that almost comes out of the mainframe.

Kind of at the end of the day, you load everything up into some big data warehouse, or your cluster, and then you run these big processing jobs that come and spit out results hours later, and we thought, you know, it doesn’t really make sense.

Like for a big, global digital business, all your data is continuous, and when you say, “oh, we run all our processing at the end of the day.” It’s not even clear like, at the end of what day?

So we kind of thought, “hey, there could be a model for this,” where instead of taking all this data and bring it all together once 24 hours later, you’d just do it all the time. Kind of model everything that was kind of happening as a kind of a continuous stream of data, and allow a real-time application that we continually process that as it occurs.

And if you did that, you could really feed that back, so that you can have news feeds that updates right away. The interaction could be much richer and more real-time. And this could be kind of the basis for connecting all these different systems, for connecting all these different applications that we would connect around these streams of data.

So we looked at the academic literature, and there’s been some stuff in this area, but it kind of fizzled out, and then we looked at kind of the commercial space, and there really hadn’t been much, and it was a kind of the old messaging systems and other layers, but nothing really good in the area and the way the databases had been developed a lot of theory and thought into them.

That was the origin of it. We started building, we spent about 5 years building Kafka, and scaling it, so it would really handle at LinkedIn: All the activity, okay, what are the actions people are doing, what’s happening right now, all the changes coming into databases – it could really be that kind of central nervous system that connects everything.

We open sourced it at that time, and it ended up getting really popular in Silicon Valley.

A lot of the big tech companies: The Pinterests, and Ubers, and Netflix’s, really took this and built around it, as really a core part of their platform.

As that happened, we really started to come into contact with a broader world. I mean, I have only really worked in Silicon Valley tech companies, so I didn’t really know a lot about how a bank is built and operates, or a big retailer’s built and operates.

But as we started to talk to these companies, we got a sense that the applications for this could be much bigger. Even than what’s happening so far. That this could be the central thing that they kind of all built around.

When we realized that, we kind of thought, hey, what do you need to do to make that happen. Because a lot of these companies just couldn’t adopt what we had as kind of, “hey, download this from Github and go put it into practice.”

They needed a lot more.

So we realized, look, there would have to be some company that we could go fund development on this, turning it into a product, turning it into a service, and take it out into the world.

When we realized that, we put some thought into how we do it. We told LinkedIn that we would be leaving, it was actually a really nice thing. Rather than being unhappy about it, they actually offered to kind of invest and help us on our way.

So, in addition to kind of raising majority of our funding is from traditional venture capitals, we took a little bit of investment from LinkedIn, which kind of helped us get started.

Michael Schwartz: It’s amazing. You don’t hear it very often that company is so supportive, but I guess they knew firsthand how powerful it was.

Jay Kreps: Yeah, it was actually an interesting part of their culture. That they believed in kind of entrepreneurship, and they believed that in the modern world – you don’t come and work someplace forever. I’d been there for seven years. I was there really from when it was pretty small to when it was quite large.

And then they believed in kind of helping people, you know, as time comes to go do the next thing. That was kind of the cultural values that I thought was pretty cool, that they actually acted on that, that they were willing to kind of back it up when push comes to shove.

Project To Company

Michael Schwartz: It’s one thing to have an open source project with lots of adoption and excitement, and another thing to start a business that generates millions in revenues.

So the company started in 2014, and I read a press release, it said that the first commercial product was launched in May of 2016 – can you talk a little bit about the experience of going from an open source project to a commercial offering?

Jay Kreps: It’s actually an interesting thing, there was actually two main ways of commercializing open source. I think for most open source projects, they don’t really need a company, and they actually won’t really support a company in an easy way.

If you go to GitHub, there’s probably some millions of repositories there, I think most of those would not succeed as a stand-alone company. You kind of need something, where it’s popular enough, and there’s enough depth to the problem that people kind of want ongoing work and effort and support and services.

It’s worked pretty well for these core data systems and platforms, it is not the case that it actually works for every kind of little library.

We felt strongly that there was an opportunity in the area we were in, even though there wasn’t really an existing example of any kind of open source company in that space. Because what we were doing is data system that’s not a traditional database.

The first debate was really whether we wanted to start with a posted service in the public cloud, or whether we wanted to start with a software offering that people could run in their own data centers, but that they would have to run. So would we do something that was fully managed, or would we do something that was more of a software product.

We actually thought a lot about this, we came out of the world of really running the software at LinkedIn. Like, we built it and we also operated it. It was pretty attractive to build an as-a-service offering.

Eventually, when thinking about this, we realized you have to do both. And it’s just a question of how you sequence.

We ended up starting with the software offering because we realized for a lot of companies, that’s kind of where they were, and that they were kind of already adopting the open source – that was a nice transition to be able to use the commercial features, if that was useful.

So we did that, and about a year ago, we released our as-a-service offering that’s in the public cloud. For the software offering, we kind of targeted a kind of an open core model, where there are set better price features beyond the open source, and you get also support across all of it.

For the cloud offering, it’s pretty much the same thing, you actually get it run for you, so you don’t have to develop the expertise and stuff to do that.

Open Core V. Tools

Michael Schwartz: When I was looking at some of the diagrams of the Confluent platform, I was wondering if it was more of a tools business model similar to Cloudy, or if it was open core. Is it sort of the same thing to you, or do you differentiate a little bit between those?

Jay Kreps: I think one question that, at least in my mind, has been answered is – you’re building a business around open source, do you need to have any kind of the commercial IPS as part of the offer.

Do you have any software that’s not open source, or is it better to do a pure open source offering?

I think defining it as it has been is definitely better, to have some kind of commercial offering that isn’t purely open source, and you want to find a way to do that doesn’t kill all the attractiveness of an open source platform.

The whole reason people want these open platforms, is there’s such an amazing ecosystem around them, there’s no lock-in; those are the huge advantages.

So you don’t want to ruin that as you think about how to commercialize it, and you don’t want to poison the community. You really want to support, enrich the open source community, rather than stifle it.

To me, that’s the finding that I don’t differentiate much between kind of open core or proprietary tools around the side of this – that’s all. You’re looking for a commercial offering that kind of supports the open source.


Michael Schwartz: Kafka is Apache License because it’s Apache project, but how did you license the other pieces of the Confluent platform?

Jay Kreps: Yes, we do both. We have an open source part of the offering, for example, KSQL is a layer where you can take the streams of data coming into Kafka, and you can do SQL queries on top of them. And that’s open source, so you can download it and use it.

For a lot of the management, or monitoring, or operational features, that can allow you to connect up between data centers or run the stuff at scale. We built those as commercial software, which has a proprietary license that’s part of our offering.

And the way that’s sold is you buy their service. You’re basically paying for your use of the service, and we kind of muter that use by the data that flows through it or that’s stored.

And if you buy the software, we do that as a subscription that is priced by the number of nodes that the software is on.

Both of those are roughly usage-based models that are subscriptions. And I think that’s actually a great model for companies like us, like it means our incentive is to make sure you’re getting value out of the software that you’re using it, and that you’re getting value out of us. And that you are able to use it more than it grows as a platform within the company, more applications are able to come on and take advantage of it.


Michael Schwartz: Were there any challenges around figuring out what item you’re going to actually get on, what was the right pricing model for that? Did you get it right initially, or did you have to adjust it a couple times?

Jay Kreps: Some of the things we got right and some we didn’t.

It’s interesting that you said that, so yes, for these hybrid offerings, where it’s open source but there’s also commercial features, I think they have an element of a kind of a freemium offering. That’s actually really common outside of open source, even.

I come from LinkedIn previously, and parts of LinkedIn’s business had that model, where they had basically a free offering which is probably most people’s experience of linkedin.com, and then you can get a professional account.

And then they have really a full enterprise software product for sales and recruiting that’s totally different from the main site and experiences geared towards users and those demands. It’s very much kind of a freemium model, where you can come and get a fair amount of value, even if you don’t pay.

For many of the people who are, they can get even more value if they can upgrade to one of those tiers and they can target different users.

So, yeah, you have tension in any of those models, because if you gave everything away for free, your users would be the happiest, but you also would have no business.

If you kind of kept everything back, you would have more to sell, but actually that whole basis of users, and that whole set of people who are getting this and building around it, and kind of growing it into something where they would get value from the commercial features, wouldn’t exist.

I think it’s actually a bit tricky to draw that line, but it’s actually very powerful for companies that can because they are able to basically create a lot of value in the world. And even if they’re not capturing all of it, they can capture quite a lot.

In open source, some of these infrastructure layers are so powerful that even if you’re only capturing 10 or 20% of the value you’re creating, that is still quite substantial.

I think that’s actually in many ways the hallmark of a lot of these software-driven businesses, it is very much like that.

If you look at Facebook, or any of these kind of internet services, they have similar dynamic where, in some sense if you think about the raw time people spend on them, they don’t make that much money in comparison, but the scale is so vast that they are still making a lot of money in absolute terms.

Range Of Customers

Michael Schwartz: What are the ranges of customers who convert into Confluent platform customers?

Jay Kreps: Yeah, we’ve got companies of all kinds.

Huge car companies that are doing these amazing connected car projects, where they are connecting all the cars that they sell to the internet, and they are adding all these software features around that, and internet services, they kind of have power stuff in the cars.

Big banks that are kind of rebuilding all their core infrastructure, and trading platforms, and risk systems, and security, and fraud analytics, big retailers that are doing kind of real-time inventory management, so really across the board. And we’ve seen that with kind of small companies to some of the largest companies in the world.

Definitely, it’s kind of time to talk about a commercial offering is when people are getting real value. I think one of the things that was wrong with a lot of the commercial infrastructure platforms is that when a new platform comes around and you evaluate it, you end up quite tied to the platform overtime.

So for something that’s new and kind of unproven and proprietary, you can be hugely tied to it, and it’s not really taking off in the world yet. So you saw so many of those platforms going to fail to ever achieve escape velocity.

And I think open source really solves that for these new platforms. People can do development against that, they can download it, do development against it, you really understand how it works, put it into place for less important applications – all without really spending any money, so the risk is pretty low.

And then, as it becomes something that’s generating a lot of value for them, there can be offerings that would make a lot of sense where they’re interested in paying more to get more.

A lot of this kind of failure to thrive that you saw in so many commercial infrastructure platforms, because this wasn’t just to work the lock-ins and embeddings, so they’ve got a small vendor, it’s really overcome by open source, and I think that’s why you see so many of the popular infrastructure of all types are open in that way.

Customer Segments

Michael Schwartz: So in your customer base, do you segment at all?

Jay Kreps: There are two dimensions that matter, small companies to large companies, and less technically sophisticated to more technically sophisticated.

Of course, there are companies in all of those segments that use the product in different ways, and their kind of needs and preferences are different.

If you think about the small, technically sophisticated companies, I would say that’s the Silicon Valley, but also New York, and parts of Europe, a tech startup scene.

What they want is a hosted service, you know, they don’t want to buy a software, they don’t want to run stuff, they just want to be able to use these services dynamically, pay for what they use and go.

Large technically sophisticated companies, they have substantial on-premise blueprints, they need something that spans both environments.

They need to be able to run the software and their data centers, but also in the cloud, and connect all that up and stream data back and forth, and kind of get something that works across all the environments there.

Let’s say, small non-technical companies, we don’t end up working with them that much just because they don’t really need what we are doing.

How To Prevent Saasquatters

Michael Schwartz: Amazon has a service choosing Kafka, you’ve probably seen MongoDB and Redis have adjusted their license to prevent large cloud providers from competing with them, in a way that doesn’t support the community.

Have you thought about perhaps forking Kafka and providing it under a different license to enable you to capture more value from some of the large cloud providers who might be deploying it and benefiting from it without really contributing back a fair amount?

Jay Kreps: Yeah, it’s definitely been an issue, where the public cloud is so new.

The kind of the dynamics of how you manage an offering, there are still merging, a lot of the assumptions that went into how open source is licensed. They really predict that, they come from a world where the way software was delivered was, it was shipped to you.

Obviously, that’s been untrue for the kind of cloud applications, where it’s like a UI, Apex, Salesforce for a while, but the new thing is that this kind of infrastructure-as-a-service ability to get a database to build these kinds of back-end infrastructure pieces, which is typically the domain of open source, to be able to get that as a service.

Open source licenses haven’t come to maturity in that time, so I think for a lot of these companies, they are kind of going through a struggle, trying to figure out what’s the right balance. How much should be made freely available to anyone who wants it, even if they want to create a competing service and not contribute, and how much should we not do that.

I think the balance people mostly want is actually fair for your customers, to kind of take the software and not pay. That’s kind of the deal that I think you want with open source. You want people to be free from lock-in to the vendor, and that’s a big part of the value.

It’s unclear that, you know, you want to basically fund a ton of R&D for Amazon, who should be able to afford to contribute as well if they want to have an offering like this.

I think people are trying to figure out the right way to adapt to that and do it in a way that’s community-friendly, and that accomplishes the goals of a company and actually just allows the communities to thrive. And I don’t think there’s any kind of final answer to how to do it, not the one that we know yet, but it’ll be interesting to see how it evolves.

Value Prop

Michael Schwartz: What’s the value proposition for Confluent customers? Why do they use the Confluent platform versus a platform of one of your competitors, or just the open source, I guess?

Jay Kreps: There’s a couple things that they’re looking for.

I mean, a big part of that is, how do you take one of these big, scalable distributed platforms and really make it production-ready, and really be able to operate it at scale, and be able to develop on top of that.

Both for our cloud offering and our software offering, that’s a lot of what we’re helping people do is really make it real. So when we work with companies, everything we do with them is really structured around that, we have commercial features, we have open source features, we have support, we have consulting, just to advise people on how to do stuff and to help them get set up, we have training to help them learn how to do that.

All those that’s really just aimed at this – “hey, how can we take a company that doesn’t have this technology, how can we get them started on it, and then how can we make their initial applications successful?”

And then how can we turn it into really a platform that more and more applications can be built on site becomes this kind of fundamental, central nervous system that connects all their stuff. So all of our offerings are actually in that value proposition.

So very much, how can we get from point A, where we don’t have this capability, to point B, to point C, to point D, where it’s this amazing capacity that we build around?

That is what we really help companies do at a really high level. And the way we do that is obviously through all these mechanisms.

So if you’re getting it as a service, then you have to kind of build all the expertise in-house. If you are getting the software, then it makes the people who are standing this up, they are operating it a lot more efficient. There’s a lot less that they would have to build in-house to do that.

Sales and Marketing

Michael Schwartz: What’s the primary channel for getting new customers? Do they find the open source and then they call you, or what’s a typical sales cycle look like?

Jay Kreps: Yeah, that’s exactly right.

The problem that open source solves from a marketing point of view, for infrastructure platforms is basically that if you think about these kind of big complex, distributed layers in a company, you don’t put them in or take them out very often.

You really want it to be the case that people are aware of what you do, they’re aware of the value, in a sense you are a kind of a default choice and solution. And that they kind of call you when they’re thinking about doing some change there.

So you want and kind of get a market speak would be called an “inbound sales” mode, and you want that to be as much of your funnel as possible.

If you try to do it the other way, where you kind of go door-to-door, looking for people who might want to make this change in educating them about what you do, it’s actually pretty expensive, because you’ll find that even if you find a lot of people who are interested, it’s just not the right time, and the right time may not be for some years.

So you end up with these enormous cell cycles and a really high sales cost actually, get the initial deal done, and then they still have to grow the platform to scale internally.

By working in an open source model, you can come to turn that around, where they kind of compete, they can take the open source, they can use it, as they’re getting value out of it and thinking about, “hey, we are taking some production application to you.’

That’s when they may be interested in talking to you and understanding the value proposition you have. That’s the advantage, as from a commercial point of view, to have an open source offering.

Michael Schwartz: I saw a press release from 2016 that you brought on director of marketing and a director of sales. What’s been the impact of formalizing the sales and marketing functions?

Jay Kreps: To scale, go-to-market activities, you obviously knew that. So we have a full management team that covers everything you’d expect from Chief Revenue Officer, Chief Marketing Officer.

You kind of need the full set of disciplines that you would expect to be able to actually scale the good market activity, and that’s been super important for us.


Michael Schwartz: Have you been working with channel partners, and what role have a channel or distribution partners played for developing the sales channels?

Jay Kreps: If you think about we do, what we’re all about taking these streams of data in a company in and being able to connect everything up, and so there’s a bunch of ways that we work with partner companies.

One is that with technology, we can just plug in, so that they can get these streams of data, analytics layers, security layers, and databases, all the different types of things that might either be a source or a sink for these streams of data.

The other way that we work with partners is SIS, so people would come and help companies do development. They often specialize in industry verticals and they really know the big projects in a company.

And so they can kind of help take a new, exciting technology, like our platform, and they help to apply it in these companies, to the problems that are kind of top-of-mind in that industry, which may be very, very industry-specific.

There’s a ton of value in that, and it helps to get leverage and scale. Then we partner also with cloud providers in different ways in selling our cloud offer, so all of those are important ways that the partnership can help drive the business.

Building The Team

Michael Schwartz: What’s your philosophy about building the team – do you try and keep everyone geographically close, and how do you find people and what are you looking for?

Jay Kreps: That’s a great question. I think ultimately for a company like this, company IS the team. There’s no other big asset that the company has.

There’s no factories and machines that you bought – it’s really just a group of people who are really passionate about making something happen in the world.

So all the energy has to go into making that set of people, the right people, and make sure that they’re motivated, and that teams can all work together.

In the terms of geographic separation or consolidation, in a kind of distributed model we have offices in Palo Alto, in London, in Munich, in Paris, in other parts of the US, but we actually take them on remote places as well.

On the engineering side and the sale side, we often go into territories with sales people who are in their territory. We found that works really well. You have to build the company to support it and to work that way, but it allows you access to the best people in the world.

And one of the advantages we have is people all over the world, who are really passionate about what we’re doing, you know, want to be part of it, but they don’t want to move to the Bay Area for lots of reasons.

So, fundamentally, the choice you’re making if you’re building a company is, either I want the best people in the world, or I want the best people that are within an hour drive of my office. And there is a difference between those two things.

There’s a lot of value in having people kind of all in the same building, but there’s a lot of value in having the best people in the world too, and so we chose the latter, and I don’t think we looked back on it.

Open V. Commercial Features

Michael Schwartz: So, there’s a million features you probably want to build for both open source and the commercial. How do you prioritize the work in terms of what goes into open source and what goes into commercial? And how much effort do you spend on each?

Jay Kreps: I mean, we have kind of a vision in the space, so I guess a lot of what we do is oriented around that.

Our vision is really that you could have something very much like what databases have been for static tables of data, you can have that for these continuous streams, so you could really build the company around this stream of events of what’s happening.

All of our efforts are kind of oriented around making it happen, so we are, either building up the stack and trying to add these sequel layers and interfaces, to make it really easy to work with the streams, or we are building core infrastructures that allow us to scale better across data centers and provide more fault tolerance, scalability and redundancy.

We’re building a kind of cloud and operational tool sets that make it easy to get the stuff, or run it in your own data centers. All those are really just organized around that same mission.

In terms of what’s commercial and what’s not, we targeted an operational value proposition, like I said. So the feature is going to fall in that bucket. We usually say – “hey, is this something that would make a compelling commercial feature?” – and if not, we make it open source.

Michael Schwartz: Maybe just to push you on that a little bit, there’s probably a hundred in both categories. How do you decide how much to invest in open source vs. commercial?

Jay Kreps: We don’t actually target our percent. I would say our percent of open source development is still quite high.

It’s not because we’ve managed to a particular percent. What we look at is like, “hey, for these commercial features, what do we need to do to make them really compelling?”

And then in terms of our overall opportunity, what do we have to do to get there?

When we think about what makes it good open source features, it’s the layers that people are building against, that their applications are going to be committed to.

That I think people actually don’t want.

They don’t want to be tied down, they don’t want to be locked in. So there are things that are inherently part of the open source development.

Should we really think about it that way? You know, we want to make sure we have a good commercial feature set that is worth paying for, but we don’t really target the percentage of a developer’s hours, one way or the other.

Closing Advice

Michael Schwartz: Do you have any closing advice for those entrepreneurs who are just getting started in open source?

Jay Kreps: I think the open source businesses work well when there’s kind of some depth to the problem, and where you’re really building the platform that other things will build around.

I think that computer industry just loves open standards of all kinds, x86, the operating system layers – all of these things have created these platforms in ecosystems that people could build around in it. It’s turned out that open source is a great way of building that kind of platform.

So the internet maybe it’s built around the TCP/IP, but not everything can be a protocol. X86, I don’t even know the right way to describe what it is, but not everything is in instruction set. For everything else, I think that we found that the best way to build kind of an open platform is open source.

That tends to have the best ecosystem, it tends to have all the right feature set over time – those are the areas where I think it makes sense.

I think when people target some kind of point wise solution or application that’s kind of limited in scope and not really a platform in any way, I actually think open source tends to not win out in those domains.

When people talk about something that’s going to be a platform, many applications or systems will be built around that will have an ecosystem of other tools, and they’re great with it, that’s where open source seems to dominate.

So I would look for those opportunities, and I would look for an area, where there’s going to be enough value that the important thing is to become kind of ubiquitous, and even though you may not capture all the value as a company, you can still be incredibly successful.

And I think if you pick those areas, that’s kind of where I think open source companies can really thrive. And then of course, all the difficulties of building a company still apply. All the hard things you have to do, it doesn’t get you out of any of those.

But it does solve some of those problems in good market that I described – how you find customers, how and what you sell to them, and so on. I think that can be really helpful.

Michael Schwartz: Jay, thank you so much for sharing your thoughts on this.

Jay Kreps: Yeah, my pleasure. It was wonderful to talk.

Michael Schwartz: That’s it for episode 13. Special thanks to the Confluent team for helping us coordinate with Jay.

Transcription and episode audio can be found on opensourceunderdogs.com.

Music from Broke For Free by Chris Zabriskie and Lee Rosevere.

Production assistance and transcription by Natalie Lowe. Operational Support from William Lowe.

Follow us on Twitter, our handle is @fosspodcast.

Next week we’re publishing one of the interviews we’ve been looking forward to since the start of this project Mark Shuttleworth, Founder and CEO of Canonical. The company behind Ubuntu Linux – don’t miss it.

Until then, thanks for listening.

Popular Episodes

Subscribe to our newsletter
for news and updates