Episode 28: InfluxData – Purpose-Built Time Series Database with Paul Dix
Podcast: Play in new window | Download
Subscribe: RSS
Paul Dix is the Founder and CTO of InfluxData, creator of the popular open source time series database, InfluxDB. In this episode, Paul discusses finding balance between commercial and open source offerings.
Transcript
Intro
Michael Schwartz: Welcome back! You’re listening to Open Source Underdogs.
I’m your host, Mike Schwartz, and this week, we’re honored to be joined by Paul Dix, Co-founder and CTO of InfluxDB.
The goal of this podcast is to gather first-hand accounts from the founders who helped build successful open source software companies.
Started around seven years ago, InfluxDB is a time-series data platform that’s achieved significant market adoption, including deployments and more than 450 Enterprise customers, like Cisco, IBM, eBay, and Siemens.
The company has raised around $120 million dollars, which its using to expand operations around the world.
As both the founder and longtime developer, Paul has some deep insights about open source business. So without further ado, let’s cut to the tape.
Paul, thank you so much for joining us today.
Paul Dix: Thanks for having me.
Origin
Michael Schwartz: I guess you were a developer before you started InfluxDB. I’m wondering about how did the company come about?
Paul Dix: Yeah. As you mentioned, I’m a developer. I guess I should probably start – I’ve been in developing software for a long time, since I got in the computer industry in the late nineties.
And the experience that I have that is most direct relevance to Influx is, in 2010, I was working at a fintech startup in New York City, and we had to build essentially a time series solution for tracking market data in real-time. We were building a pricing engine that would update prices, price predictions once every 10 seconds for the hundreds of thousands of different financial instruments.
Building a solution around that was my first foray into time series. And for that, I used web services written in Scala, with Cassandra as the long-term data store, and Redis as like a real-time indexing engine.
From a developing background, that was kind of my background. But from an entrepreneurship perspective, I always knew that I wanted to start a company, and it was basically just a matter of building up enough experience along the way – like working at other startups, working at large companies – and getting to a point where I felt comfortable venturing out on my own and trying to start something.
Is Cloud Best Monetization Strategy
Michael Schwartz: In one of your talks, you mentioned that open core and cloud are two viable revenue streams for pure play open source companies – I’m wondering if you think that that’s still true?
Paul Dix: I guess, depending on your viewpoint, open core is not a pure play open source strategy, strictly speaking. If you’re thinking pure play open source, like, everything you do is open source, and basically you just charge for services, whether those services are professional services, or cloud hosting, right.
Realistically, I think successful businesses that are built around open source have to be open core in some way. And I definitely count SaaS platforms in that vein.
Basically, I think the key is, you have to have something in open source that’s interesting enough that people can solve enough of their problems with, where a large community of users can build on top of that, or use your software without becoming customers. That just has to be the case where it can be a successful open source project.
And then the core part, that’s the open core. It has to offer some value that’s interesting enough, that some small percentage of that community will pay you for it. I think, if you’re looking at infrastructure software, the best method for building a business on top of that now is basically as a cloud-hosted service.
Now, obviously, not all infrastructure is in the cloud, and there’s obviously still a very large component of on-premise enterprise software. But I think, as a software delivery mechanism, like a SaaS hosted service is just so much better because you have the ability to fully instrument it, to fix bugs quickly, and to really do a bunch of things that just are basically impossible if you’re delivering on-premises software.
As from the business perspective, if you look at other open source companies, that’s largely played out over the last few years, where the companies that are most successful has essentially SaaS products that use their open source core but have a bunch of closed source software around them: MongoDB’s Atlas, Databricks is basically a SaaS product of Spark, Redis Labs obviously hosting Redis, Elastic has their own hosting stuff.
Support As Revenue Stream
Michael Schwartz: Your original monetization strategy was around support, and I’m wondering why you think that didn’t work?
Paul Dix: I think part of it has to do with our project maturity at the time. I think support works well if you have a piece of software that has become what I call ‘critical path’ for larger customers who are willing to pay for support.
Critical path generally, in the database world, means an LLP database, that is used directly in an application.
Influx frequently is used in monitoring cases, where the data is important for monitoring system, but it’s not what, as a user, your customer sees.
Particularly at the time, when we first offered support, which was in the summer of 2015, there weren’t as many people yet using Influx in production, in a setting where they just needed support, and that they would pay for.
Ultimately I think, support as a business model for open source, it kind of pits you almost against your community. Because the thing is, if your software is too easy to use – or too good – people won’t need support.
The only thing they’ll purchase support for basically is an insurance policy, to make sure you’re still around and pushing the software forward, which is a limited audience that you can sell to.
The other thing is – as an open source project becomes more and more successful, other people will come in and offer support around it.
In my talk a couple years ago about open source business models I said, “If support is going to be your plan, as an entrepreneur, you’d be better served by picking an open source project that’s already popular, and offering support around it.” Because, if you’re building the open source project yourself, like, all that engineering time that you’re putting into it, are basically billable hours that you have consultants not billing. If you are consulting shop, you need your people billing.
This is why Percona offers support for MySQL, and other databases, because it’s better to build a consulting organization around existing projects.
Market Segmentation
Michael Schwartz: Right, I think that’s true.
Time series databases are used by a wide array of companies – practically any organization could be your customer. I’m wondering if you segment the market at all, to figure out who do you sell to?
Paul Dix: There are definitely different market segments, but normally what we do is we segment on use case.
We have what we call a “DevOps monitoring,” which can be server monitoring or network monitoring, or monitoring services, application performance monitoring, real-time analytics – which could be business intelligence, it could be all sorts of things.
Sensor data is a big use case, particularly in the industrial sphere; think like oil and gas wells, power generation, power plants, solar, wind, all that kind of stuff.
And then, finally, financial market data is an obvious choice for time series. That’s kind of how we segmented it.
In terms of what industry verticals we’re playing, like I said, in IoT alone, you can track a bunch of different verticals: Oil and gas, renewable energy, factories, different stuff like that. And then server monitoring, again you could have different verticals, like we have eCommerce retailers, we have other software startups that use our stuff as a platform. We have people in finance, research. All that kind of stuff.
Project Or Company First?
Michael Schwartz: Did you start the company and the project at the same time?
Paul Dix: The company actually predates the projects, which is not very common for most open source businesses. Usually there’s an open source project that you then try to commercialize later.
The company was started essentially as SaaS products were doing real-time metrics in monitoring, kind of in the same vein as like Datadog or Stackdriver, or some pieces of New Relic. And when we were building that company initially, what we found was: One, our product wasn’t really taking off, we didn’t have a good clear differentiator on the product; but the other thing was, we had to build all this infrastructure to actually build that product. We essentially had to build a time series platform.
I started the company with my co-founder in 2012, halfway through 2012. We did a Y Combinator until 2013. And by September of 2013, I realized that that wasn’t going to take off, and I thought, “Well, let’s just take this infrastructure stuff that we’ve been building for this application,” it’s called the Errplane – “Let’s take that code, let’s take that package, like start fresh, add a couple of things that we learned building it, and start as a fresh, new, open source project.”
Myself, my co-founder, and one other guy, iterated on this for about five or six weeks. First commit was September 26th of 2013. We put together basic documentation website, and I arranged to give a couple of talks at meetups in New York City. One was the Ruby programming meetup, and the other was the open statistical programming meetup.
I gave those talks in early November of 2013. And the project just immediately took off.
People were very interested in it. The docs site got posted to Hacker News and was on the front page all day. And I basically just kept giving more and more talks about it. It was obvious that we kind of struck a nerve and found a real need that wasn’t being addressed, at first in the database space, because we were just focused on the database.
But, over the course of 2014, I built out this bigger vision of creating a platform, essentially for solving problems for which time series is a good abstraction, and these are those use cases I mentioned earlier: Monitoring, server monitoring, real-time analytics, sensor data, and fintech data.
Over the course of 2014, gave more talks. I raised the Series A round of funding, which closed in November of 2014. It was an eight million dollar round led by Mayfield Fund and Trinity Ventures, and then we just kept going from there.
But, like I said, I think most other open source companies are actually created after the formation of the open source projects.
Although I guess Docker, for example, there was a company called dotCloud that existed for well over a year before Docker came to be. And actually, Dan Scholnick, the partner of Trinity Ventures who co-led our Series A, was the first money into dotCloud, which is the company that became Docker.
Has Open Source Been Materially Beneficial?
Michael Schwartz: Would you say that the open source community contributions have been materially valuable to the company?
Paul Dix: I would. But it depends on what parts of the project you look at.
Over the years we’ve had over a thousand people, at least, contribute to code to different parts of the stack. But the thing is a database is not a very welcoming thing to contribute to. It’s pretty esoteric. Even though it’s written in Go – which makes it a lot more accessible than let’s say something written in Erlang, or C++.
So we’ve got contributions there, but I think where we’ve had the best community engagement contribution is actually in our data collector, Telegraf.
Telegraf has 200 plugins that allows it to collect data from various network services and stuff like that, and then ship it to other places – InfluxDB happens to be one, but you can also ship it to other databases, and even other SaaS vendors who are competitors with what we do.
Because of the fact that Telegraf is liberally licensed, it’s MIT with no restrictions, just MIT license, and we haven’t put a limit on what it integrates with, namely it can integrate with competitors, and that’s okay – it means that most of those 200 plugins have actually been developed by the community.
So Telegraf, from an open source perspective, and a community perspective, is actually our most successful project.
Telegraf Distribution
Michael Schwartz: Do you facilitate Telegraf through a marketplace or some other way to help it grow?
Paul Dix: No. It’s just all just bottom-up. As I said, it’s a data collector, so people deploy it widely to their infrastructure. We have no visibility into where it’s running or who’s running it, other than community members who raise their hands and tell us they are. Obviously, the pull requests that come in on the repo, and our customers who use it.
Now, we have relationships with like Microsoft, for example, who has Telegraf as an agent that you can deploy across all of your Azure infrastructure, to send system metrics and things like that to their metric service. So, we know it’s running there.
There’s obviously their Docker images for it, and there are Telegraf images, and pretty much every cloud provider at this point. But the same is true for InfluxDB.
Commercial V. Open
Michael Schwartz: Going back to InfluxDB a little bit. I’m wondering about how you find the balance between what to make commercial and what to make open source?
Paul Dix: This is a really tricky one. It’s something we talk about all the time internally. And it’s not something that I got right out of the gate.
In late 2013, when we were first building the project, it was me and two other people, with a seed round of funding. We had enough money in the bank to last us, like, a year. And my only goal at that time was to get as much visibility for the project as possible – everything we did was out in the open.
And then 2014 we raised the A. 2015 comes by. And then in 2016, I knew that we were going to have to go out and raise a Series B round of funding for the company, to continue to work on things. And we still didn’t have the real clear delineation of how we would actually turn this into a business, beyond it just being a popular open source project.
And as I mentioned, in the summer of 2015, we offered support contracts as something that we hoped would materialize into actual revenue. But up until early 2016, I think we signed up maybe like one or two people to a support contract. Not enough to build a real business on.
So, basically in early 2016, I started talking to other open source founders, and everybody in the company at that time. And where I landed was, basically what we would do is, for future versions of InfluxDB, we would make high availability and scale-out clustering commercial, and closed source. And basically anything on a single server would be open source and licensed under the MIT license.
We kept that same line, that same delineation since I announced that in early March of 2016. But it’s definitely something we revisit periodically, just like say “okay, should we change where this line is drawn?” Generally, what we want to find out is how can we put more of our code into the open? How can we put more of our code into an MIT license codebase?
What I learned from that experience of writing that blog post and seeing the reaction in the community about it was – once you put something out in the open, it is incredibly hard to pull it back. People get really upset, deservedly so.
But the thing I tell people then, and still now, is that – if we hadn’t made that decision in 2016, all of the code that we developed in the open since then would not exist. Because we wouldn’t have a company. There’s no way that the company would still exist if we hadn’t done that.
Basically, as we do stuff now, essentially we still have the same drawing line – if it’s multi-server, then it’s closed. But, we periodically think, “Okay, is this something we can actually release in the open source area?” We still revisit that all the time.
Pricing
Michael Schwartz: Pricing is one of the hardest things for tech entrepreneurship. I’m wondering if you struggled with pricing; how often you have to change your pricing over, let’s say, since you went to the open core model?
Paul Dix: We basically have two products.
We have the Enterprise product, which is on-premise software. And that’s always been licensed on a per-core, or per-server basis, which is very similar to like whole other database vendors license their software. That price, I think it’s changed once or twice since we released it. The first release of that product was in early September of 2016.
The other product is our cloud offering, which right now is only in AWS. You can actually spin it up in JCP as well – but that’s actually our on-premise version that you are spinning up.
With the cloud offering, we price based on the amount of storage you want and essentially the size of the servers that you’re going to be running, in the cluster that we run for you. Essentially what that is, it’s a single-tenant service, we spin up a new cluster for each person that comes in and signs up, and that’s on-prem Enterprise software but run as a service for people.
We repriced that once we launched it, we launched that mid April 2016. But what we’re doing right now is, we are actually in the process of creating InfluxDB 2.0.
InfluxDB 2.0 is almost like re-envisioning the platform, not just the database.
So the idea is the platform as a whole offers an API and a user interface for collecting data, defining collection rules, storing data, querying data, visualizing it in dashboards and that sort of stuff; also processing it, be it for ETL, or monitoring alerting, and that kind of stuff.
We deliver that in three different form-factors. Open source, which is a single server, and that’s MIT licensed. A cloud product, which for version 2, we are going to price it as a usage-based model: Bytes written into the API, bytes out of the API, number of API calls, compute time for queries like ad hoc queries, or for background processing, and storage hours.
It’s basically like 5 different pricing vectors. They’ll be familiar to anybody whose a customer of AWS, or JCP, or Azure. It’s basically a multi-tenant platform – you pay for usage, and you don’t have to worry ahead of time of like, “Oh, I need two VMs with this much memory, and this much CPU, and all this other stuff.”
The 2.0 offering is something, from an engineering perspective, we’ve had in process for a year and a half at this point, but the vision for the 2.0 cloud offering of being able to offer usage-based pricing is something that we’ve known we wanted to do for over two years.
Business Built Around Pricing?
Michael Schwartz: Sounds almost like you built the product around the business model.
Paul Dix: Because I’m an engineer, it’s hard for me to decouple the things, and also, like I said, the experienced early on, of trying to create the open source project, to make it popular – and then suddenly trying to figure out how to make a business out of it – made me very sensitive to the 2.0. version of thinking about everything as a whole.
Ultimately, like I said, all open source software development is subsidized. And the subsidy has to come from somewhere.
Either, it’s going to be a foundation, which pays for developers to work on things. Or it’s going to be other companies that fund it, they have their own successful business models and they have developers working on it. Or, it’s going to be a single business that creates a successful business around that project.
I think it’s useful in the open source software to think about the business at the same time as you’re thinking about what this software is going to be, how it’s going to be designed, and how you’re going to ship it to your users and then also to your customers.
Sales Process
Michael Schwartz: Talking little bit about sales: Are most of the sales leads inbound? I’m wondering about your experience, growing the sales team in the traditional sales process?
Paul Dix: Yes, most of the sales leads are inbound.
Most people who come to us, say they want to become a customer, started with the open source code, probably actually got it in production in some way, and they had been using it for a while by the time they come and talk to us.
But even within that, I would say there are two kinds of important distinctions between how software is sold, and we kind of have both in our environment, which, I think it’s becoming more common with open source vendors, but which 10 years ago it wasn’t.
Usually, you have what’s called an Enterprise sales model which is, you have expensive sales people, who are doing outbound sales motion, or even inbound sales motion, where you line up contracts, annual contracts, or whatever.
Or you have, what I call, like a self-serve business model, which is: Anybody can come to your website, they can sign up with a credit card, they can become a customer, and they can buy as they go and actually increase their usage over time. We actually have both.
The thing that has been shocking to me over the course of building this company is just how much friction there is in the Enterprise sales model. But it continues to be something that exists because many companies actually want to do business this way.
Partnerships
Michael Schwartz: Do you have any channels other than direct that account for a meaningful amount of sales?
Paul Dix: We are just now ramping up partnerships.
We do have a partnership with PTC ThingWorx for their IoT platform, where Influx is a key component of that. We’re having some customers come to us for that.
In April, we announced a big partnership with Google Cloud. Google Cloud is making a move to a big push to support open source technologies, and InfluxDB is one of their best in class solutions that GCPL offers as a full service.
They have this whole video with other open source vendors that they picked to partner with. We’ll have that launching later this year for our 2.0 products.
Cloud Strip Mining
Michael Schwartz: In the past, you expressed concern about the large cloud companies potentially being at odds with open source companies. I’m wondering if your concern is somewhat abated?
Paul Dix: No. It’s still a concern for me.
Some people will say MongoDB is no longer an open source company. They relicensed their code under the SSPL, which is not recognized by the OSI, so in theory MongoDB looks more like, what I call a “Freemium” software company. There’s a free product that you can use, which is the MongoDB community, and there’s a premium product.
The same goes for Elastic, for the parts of Elastic that they don’t have license under standard Apache 2 License. They’ve made a number of moves over the last couple of years to carve out pieces of their platform that are either not open source at all, or source available, but under licenses that essentially make it a non-open source thing.
Yes, those companies are thriving, absolutely, but the moves that they have made, with regards to their licensing, are basically direct responses to the threats that they see from cloud vendors.
By all back channel things I’ve heard, AWS makes more money off Elastic than Elastic does.
I think the tricky thing is when it comes to if you’re going to make a business out of open source software, and what you want to provide is a hosted service, your cloud vendors have a competitive advantage that you cannot possibly hope to get, which is economies of scale.
You cannot buy hosting cheaper than they can. You can’t buy hardware cheaper than they can. You can’t buy network bandwidth cheaper than they can.
So, they’re more than happy to essentially commoditize the software – commoditize the platform – so they can sell more and more hosting, which basically, like, if you want to get in the hosting business in a meaningful way, requires billions of dollars of upfront capital expense.
I think that continues to be a problem, and honestly, I think open core is still the best solution for that, which is: Keep some of your software closed-source, develop a service around it or develop it as on-premise Enterprise offering. And just make sure that what you have closed continues to be a big enough investment and competitively differentiated, so that even if one of those vendors decides to go after it, you still have some meaningful way to differentiate from that.
The truth is like, if Amazon, or Google, or whoever wants to come for you, there’s nothing you can do about it – they can outspend you. Guaranteed.
It’s just a matter of doing the best you can with the software you are delivering, and hopefully, the fact that you are the creator and the steward of the open source project gives you a little bit of an advantage in terms of creating service around it that is better, or at least preferable.
Innovation To Battle Cloud Giants
Michael Schwartz: Right. And I would hope innovation, also. That as a creator, you have an advantage of releasing new features, and keeping ahead of them.
Paul Dix: Right, absolutely.
Again, this is another question, I think it raises another question essentially, which is: Once infrastructure software gets really mature, how much innovation is there in it? How much people just wanting it to be stable in terms of API and stuff like that?
As you get more and more mature, maybe the innovation curve, or at least a feature delivery curve, it becomes less important, so it becomes easier for a larger vendor to keep up with what you’re doing.
Pay For Old Versions?
Michael Schwartz: You want to run by one business model that I heard of last week in an interview – which I’m embarrassed to say I have never thought of it, but it’s pretty obvious when somebody said it to me.
But the idea was basically that older versions would not be updated unless you had a commercial license. So if you want to update the open source, you have to go to the latest version.
So Java, for example, if you want to use Java 1.4, you need a license; or you need to pay Oracle. I’m wondering what you think about that idea?
Paul Dix: On some level, this is what Red Hat does. It’s kind of their thing. Even though they don’t have closed software, if it’s all about supporting older versions.
I think, as a developer, it’s painful to support those older versions. That’s why there’s maybe a business for it, but again, it’s still fairly limiting.
I think, also, if your software is being delivered as a service, there is less value in that, because you kind of punt on that concern to whoever it is you’re paying to deliver that service. Honestly, to me, that doesn’t seem like a very good model.
Advice For Startups
Michael Schwartz: Last question. I’m sure you’ve had a couple of really interesting years starting the company, and I’m wondering if you have any advice for entrepreneurs who are about to embark on a similar adventure?
Paul Dix: I think it is pretty important to come up with a bigger vision in terms of what you want to do fairly early on.
I know I’m saying this even though when I first started this company, we ended up changing that.
I’ll stick with open source because it’s easier there, which is, if you’re going to start a business around open source software, I think it’s important in the very beginning to actually develop a point of view behind what is commercial and what is open.
And basically, I would say, it’s worth thinking about that as part of your product design. Making sure that the product design actually matches well with how you plan to actually turn it into a business.
Michael Schwartz: Okay. That was fantastic. Paul, thank you so much for your time today.
Paul Dix: Sure, no problem. Thank you for having me.
Michael Schwartz: And thanks to the InfluxDB team for helping to organize this interview.
Transcription and episode audio can be found on opensourceunderdogs.com.
Music from Broke For Free and Chris Zabriskie.
Audio editing by Ines Cetenji.
Production assistance and transcription by Natalie Lowe.
Operational support from William Lowe.
Follow us on Twitter, our handle is @fosspodcast.
Next week, we’ll chat with Corey Scobee, Senior VP of Product and Engineering at Chef.
Until then, thanks for listening!