Mike Schwartz: Hello and welcome to Open Source Underdogs! I’m your host Mike Schwartz, and this is episode 65 with Nick Schrock, Founder and CTO of Dagster, a platform that helps companies create data pipelines, which is critical to transform and update data in order to make it useful, for example, to generate reports, content, or other actionable information.
Dagster might not be a blueprint you can emulate. Like all start-ups, there are some hard to replicate serendipity that enables Nick and his team to build this amazing company. But as Machiavelli says, “Great leaders need both – fortune and virtue.” In other words, you need to be good at what you do, i.e. virtue, but they also need some good old-fashioned luck.
But what separates a really successful founders, like Nick, is the ability to harness fortune and virtue and combine it with some deep insights about the market, and turn it into a profitable and fast-growing venture not easy to do.
So, with that said, let’s cut to the interview, and let Nick tell you, in his own words, how Dagster evolves.
Nick Schrock: Great to be with you.
Mike: Nick, thanks for joining us today.
Mike: Can I just go back a little bit and ask you to share some of your story about how you ended from going from the University of Michigan Computer Science to working at Facebook? So, that early period – how that happened?
Nick: Oh, I wasn’t expecting to talk about the preface book days. I’ll do the quick version of that. I graduated from Michigan in 2003, and I actually went to work at Microsoft, right out of school. And Microsoft’s a great company, and they treated me well, but…
And actually, the division I was in was the developer division. And I thought that they were just extraordinarily talented, but at that time of my life, that wasn’t for me, in terms of working at a big company.
I wasn’t actually sure if I wanted to do software anymore, so I went to the London School of Economics for a year, because I thought I might want to go more into finance, or even government service – you know, I was a young man kind of searching around.
But I ended up getting back into software. I worked for a healthcare start-up out of Ann Arbor, which is where Michigan is, for what – 2 and a half years.
And then, I went to Chicago to try to do a start-up. That was very quickly spun down because me and a friend, who had worked in the finance industry, we wanted to do it, but then, it was about 6 months before the financial crisis.
So, that was incredibly poor timing. I spun that down, and actually, turns out a friend of mine, who I knew from Microsoft, kind of heard that was on the open market, and he just reached out and was like, “Hey, I’m working at Facebook, it’s really a special place. You should consider looking at it.”
And I was looking at staying in finance in the Chicago area. And I flew out to Facebook, and it’s just the vibe difference between a place like Facebook and a hedge fund in Chicago cannot be overstated.
You know, everyone at Facebook was young, super excited, idealistic, the office was incredible – there was just all this energy versus all these miserable people working in the hedge fund. So, the choice was obvious from there. And then, off to the races after that.
Why was Facebook so innovative in 2009-2015?
Mike: So, what was it about Facebook in 2009 that made it such a hotbed of innovation? Like, what new problems were they trying to solve?
Nick: The engineering-driven culture there, combined with the actual product that was being built. So, the product grew at unprecedented rates, it was used in unprecedented ways and was data intensive also, in kind of an unprecedented way.
We were forced to kind of do a lot of innovation on the fly in incredibly constrained environments actually, both in terms of resources, timing – you know, we had to get stuff to work. And I think that it is true that those constraints do breed innovation.
And that time of period was interesting because in 2009 – how to put this – we weren’t really taken seriously as an engineering organization, I felt. And then, fast forward say 4 to 6 years, and we were taken very seriously as an engineering organization.
It was really cool to participate in that. And in the end, if you look at the output from that eng org at that time, it really is pretty extraordinary in terms of what systems were built internally as well as what was open-sourced.
Mike: So, few years back in 2018, after being at Facebook for, I guess, maybe 8 or 9 years, you decide to start a company called Elementl, which becomes Dagster Labs. Can you talk a little bit about how that came about?
Nick: Near at the beginning of my tenure at Facebook, I helped create this team called Product Infrastructure, whose mission was to make our application developers more efficient and productive. So, concretely what that meant is that we build internal frameworks and abstractions for the engineers who actually built the site and the mobile apps to build product.
That team did a lot of great work, and we ended up externalizing about a bunch of that work in the form of open source. So, React came out of that group – I had nothing to do with React, but kind of the people across the hall from me, so to speak, produced React. And that obviously went on to be an extremely successful open-source framework. And then, what I’m personally more affiliated with is, I’m one of the co-creators of GraphQL.
I’ve lived and breathed developer tools for a long time and also seen the impact that open-source adoption at scale can have. So, that was definitely on the mind when I left Facebook in 2017, and figuring out what to do next.
And in fact, I was going around the Valley and talking to companies, both inside and outside the Valley actually, about what their biggest technical liabilities were.
And this notion of data, an ML Infrastructure kept on coming up over and over and over. And I decided to dig into this, and very quickly I discovered that this area kind of pattern matched to what I care about and the types of problems I want to work on, typically the things I like to work on is to share a bunch of properties.
One are just engineers in pain. Like their dev workflow is broken, they have bad abstractions, they’re not productive, and purely because of tooling and abstraction reasons – that actually kind of makes me angry and frustrated on their behalf. And on a personal level, I feel that is really motivating.
Second involved finding – yeah, I like to call it like “a problem that matters”. I like working on really broad horizontal problems that could potentially have impact on millions of developers, kind of core essential problems that matter.
I was data engineering adjacent at Facebook, I wasn’t a practitioner. Data pipelining is extraordinarily important actually. People like to dismiss it as data cleaning, or they are kind of data janitor work, but when I looked at it, from kind of fresh perspective and I really thought about it, I was like, listen, data pipeline, they produce these assets, these data assets that drive all analytics, all the dashboards that you work with, all the ML models.
And if you really think about it, these data assets drive a huge proportion of human decision-making and automated decision-making in our entire society. Who gets mortgages or not, how do we price health care, what kind of news do you see – these are fundamental essential things, and it needs to be built on solid foundations.
And the fact that it – in my opinion – like, it was not built on the appropriate tools and processes, and everyone felt it was like chaotic and out of control all the time, was deeply disturbing. So, things were fundamentally, and still, in some ways, are fundamentally broken in data ML engineering. So, that’s really motivating.
Another thing, another property is that I like working on technologies that are sort of a strategic point of leverage in an organization. GraphQL fits that bill. Because if you kind of can intermediate all client-server interactions with a common software layer that has rich scheme information and stuff like that, it’s like an enormous point of leverage for tooling.
And in the data space, I quickly gravitated towards the orchestration layer because I felt it had the same properties. You know, orchestration orchestrates data pipelines. That means, it invokes every single runtime, it touches every single storage system as a result. And then, likewise, any practitioner that wants to put a data asset or pipeline into production has to interact with orchestrator in some way shape or form. So, a strategic point of leverage, I thought that was super, super industry.
And then last, like some feeling that you have a technical insight that’s novel and interesting, and that’s kind of how we got to this notion of — at the beginning we called it Software Structure Data Sets, but now we call it Software-defined Assets in data pipeline.
And the basic idea is that instead of just writing a bunch of imperative tasks to string stuff together, you instead think about it, you write a software representation of the data asset that you end up wanting to ship to production and be consumed by our downstream stakeholders.
So, that was a very long answer, but I found a problem that kind of checked all the boxes, for what I like to work on. And it’s not just checking boxes – if those boxes are checked, I’m like deeplypassionate about it. That’s kind of how I got here.
Mike: You started working on this problem at Facebook, but then you said at some point, you sort of hit this critical mass of like pattern matching, like you said. And you’re like, “Okay. I’m going to start actually a business. Maybe in Silicon Valley, it’s not terrifying, but it’s a big step.” How did that actually work? When did you decide, “I’m going to start a company.”?
Nick: It’s funny. I’m struggling to recall exactly when it happened, but I knew founding company was definitely something I was very interested in doing. Both in terms of working on a product, but also building a culture, and especially engineering culture.
In terms of company building, that part was very motivating. In a lot of ways, I was talking about how I thought the kind of the output and culture of early Facebook engineering was pretty extraordinary. And replicating the good parts of that in an independent organization was super appealing to me as well.
I think I just started talking to people and my message and the problem I identified really resonated. And then, I was talking to some investors, actually not with the goal of doing a fund raise – it’s kind of funny how it works like that – but there was like, “Nick, you want to look at data pipelining, with your background and, you know, work on something, that we should really think about formalizing this with some capital and a company, so you can accelerate your progress.”
It’s one of those things that almost just kind of happened. And I’m a big fan of, “Be an opportunistic.” It’s also true that from the time I left Facebook, I knew that founding a company had a lot of appeal to me.
Transition to new CEO
Mike: One of the podcasts previous guests, Sytse Sijbrandij, once asked me, “Do you love the product, or do you love the business?” And it’s an interesting question. I think I know were you following that spectrum. And can you talk a little bit about how you came to work with Pete Hunt, the current CEO, and do you have any advice for founders on how to navigate when there’s a pivot in the leadership?
Nick: I might like the business more than you would expect. I obviously – I don’t want to put words in your mouth – but I’m assuming you think I like the product more than the business. Actually, I did a bunch of economics and business in college and then the grad year in LUC, and I thought about doing MBA, so I’m definitely a business-minded. I imagine I annoy our FinOps people because I always like dig in about all the financial metrics and whatnot.
Yeah, we can get to Pete. I knew Pete from the Facebook days. He was one of the co-creators of React. We didn’t work really in-depth with each other then, but we met each other socially and through each other’s work, and really kept in touch for a long time after Facebook.
He wrote a small seed check into the company. We also collaborated actually on some podcasts because we were kind of obsessed with this Facebook engineering culture, and we actually put together a podcast series, Software Engineering Daily, with like 15 ex Facebookers, and we learned a lot about each other during that process.
Pete had started a start-up and sold it to Twitter, and he was working on Twitter. And I was also talking to him on and off about the business. And I was in the market for a head of engineering in early 2022, and Pete and I discussed it. And I was privileged enough to bring him on board. And given his experience, formerly being a CEO of a Dev tools company, he had built a marketing organization, and the sales organization and scaled to $5 million ARR.
I knew he was going to be much more than a head of engineering – I even had super high expectations for that – but he really dramatically exceeded those expectations. And I think, it became very obvious to me that he was just way better operationally than I was, in terms of like the mechanics of management, organization building, managing marketing, managing sales – he had done it before, and it was pretty clear.
I, at the time – just to be transparent – I was solo founder CEO, I moved around the country a couple times, I had 2 little kids. Now they’re 2 and 4, but I’ve also started a family during the course of this journey – I just needed like a co-founder figure to share the load.
Because I didn’t have the time to work on what my superpowers are, which is kind of this cross-product of Engineering, Dev Rel and Marketing I think is where I excel. And the other stuff is like, he could do a much better job with that. So, it just made a ton of sense.
I think I’m very lucky in that I don’t think it’s a repeatable process for a lot of founders to do what I did. Because you need that other human, who you know well, who would have been I think if Pete had his own company at the time, we might have just co-founded something from day one, and had like enormous trust context in – like, the transition to bring him in and then move him to the CEO position was like super smooth. I think it was like super obvious to everyone they knew it wasn’t going to be like this massive culture shift. Because like Pete and I are still aligned on so many issues.
I think the entire team was super excited about it, and the transition was really smooth – no leadership changes, no attrition, the company started performing better. I think it was obvious pretty quickly that that is the right move.
Mike: So, diving into the business a little bit, how does Dagster monetize? I see a cloud offering, is there also a license enterprise distribution?
Nick: No. We only do a cloud product. So, just for context for the audience, Dagster is a data orchestration platform. And you can think about it like, you write data pipelines in this Python framework for building data pipelines and orchestrating, meaning, ordering computations and modeling the assets that get produced by those computations.
You can install it open source, and people have deployed that to production – a ton of people, I should say we have thousands and thousands of users – but the cloud product allows us to do a ton of the hosting on your behalf.
Most of our enterprise customers have this hybrid product, where we host the control plane, which you think about it like everything is complicated – the metadata database and long-running processes that monitor things and whatnot. Then, they run their actual compute, it’s their data pipelines and their infrastructure.
So, yeah, there’s a cloud product you sign up for, we can host a bunch or all of the compute. And then, also, we add enterprise features on top of it – SSO, alerting, gobs and gobs of features that generally deal with complexity in the Enterprise that companies typically pay for.
So, that’s our primary business model: you sign up for Dagster cloud, you swipe your credit card or talk to our sales people, and you can have the best experience of a data orchestration platform in the world in our opinion.
Why sell small customers?
Mike: I noticed that Dagster sells to small teams – like you said, you can sign up for like 100 bucks – and also to large enterprise. I’m wondering does the small teams’ business actually add up to real revenue, or is it just a pipeline for enterprise customer?
Nick: I think in terms of what investors care about, and what the long-term trajectory of the business is, we certainly conceptualize it as mostly a driver of pipeline – yes – but a broader adoption as well. So, there’s tons of users that use our hosted product that wouldn’t use our open-source product. And simply because they don’t want to host their own computing infrastructure, which is totally reasonable.
So, I guess, if you kind of boil everything on the business, yes, there is – it is a source of enterprise leads, for sure, but it’s also a source of more adoption, which means more people talking about the product. More people having being passionate about the product.
Because an underlying flywheel adoption is also essential for the long-term commercial success of the company.
I think like that’s the most interesting component of it. It used to be, say 10 years ago, that you’d have an open-source product and you’d be like really pulling teeth to use the commercial or the hosted product.
I think the pendulum is really shifted now, where tons of people wouldn’t consider adopting an open-source technology if it didn’t have hosting options. Just because of the way that the entire world has shifted towards more hosted services, which is I think a win-win for everyone involved.
Mike: One of the underappreciated challenges of a tech start-up is how to price your offering. I saw a note on the pricing page about an old plan and a new plan. The new plan’s a little complex – not being an expert, I couldn’t really quite follow it. Can you talk a little bit about the pricing journey and where and why you ended up where you are?
Nick: Totally. I like to say, if building an infrastructure company were a video game, pricing is the final boss. And that actually even undersells it. Because iterating on your pricing model is a continuous process, where you have to make sure that it’s working for everyone involved, that we can run a healthy business and that the customers feel like they’re getting a fair deal in terms of — because in the end, they need to get more value than they paid for.
You are correct to point out that the initial pricing was simpler than the current model. Initially, we started out where we wanted to have like no seats limit and just charge on consumption. I felt that a very fair way of doing consumption was to just charge on the number of minutes your pipelines run.
So, the issue with that – and I think this is a good takeaway for your audience – is that customers have to morally accept the pricing plan. Like, it has to make sense to the underlying way that they think. And the problem in a data pipeline solution, if you’re charging by, say by runtime, is that frequently what you’re doing in orchestration is that you are like calling out to Snowflake or Databricks or some other heavyweight computational system that does all the heavy lifting of the compute.
So, from the standpoint of the customer they’re paying us just to kind of wait for an API call to complete. That shifts the mind of the customer to think of us as just a compute hosting service.
And if you’re just doing that, the value proposition of our product doesn’t make sense.
So, the pricing impacts the way that the customer perceives the value of the product, which is obvious when you say it out loud, but isn’t obvious when you’re kind of in it.
We’ve really stepped back and looked at this – the real value in an orchestration system is in the kind of the control signals and the metadata. Like, concretely, you open up a orchestrator, or our orchestrator, and you see all these fancy Gantt charts of what’s going on, you have a ton of visibility, and then the words that our users often use is, “Ugh! Dagster is like the single pane of glass that consolidates my entire data platform, I have visibility into all this stuff.”
So, that’s where they perceive the value. They do not perceive the value like it’s a hosted compute service. That had the benefit of being simple, but didn’t actually align with the product value that the users perceived.
We switched to charging based on metadata and control plane events that drive our UI. I think the other thing is that for founders in the audience is that you have to have a pricing model that works for sales. And early on, you don’t have enough data to know how much consumption there’s going to be for a customer, for like say the next 12 months. And with the way sellers work, they have to hit their ARR number ― that adds up to their quota, that determines whether they can feed their children or not. So, it’s very important to the sales team.
We had to also add sort of a per seat component that effectively acts as a platform fee for our enterprise customers that allows us to kind of project and forecast ARR that would be appropriate to the value it’s going to deliver to the customer.
You also have to think about the internal incentives and how it’s going to work for sales people, who are reliant on selling your product in order to send their kids to college.
Why Audience Selection is Important?
Mike: I am going to pivot a little bit back to tech for a second, but really more to talk about the open-source community. What’s interesting about Dagster is that it reminds me a little bit about the battle between Perl and Python. They were open-source tools in your area that existed before, but they were a little bit hacky or more challenging.
Can you talk about what are some of the challenges of building an open-source community in an already competitive market, where you needed a lot of features just to get the baseline of functionality? And then, how did you focus on either getting new, or getting some of the developers to switch into your platform?
Nick: You need to make sure that you have an audience that cares about what you care about, and it is very differentiated on that dimension, to the point, where they are willing to take a risk to bet on you, to work around missing features or missing integrations that might exist in a more mature solution. So, identifying that small subset I think is extremely critical.
There’s now, I think, a kind of standard reading for Silicon Valley founders, which is Peter Thiel’s book Zero to One. And he talks about how you start with a small market and then dominate it, and then move on to progressively larger markets. And I think that really, really resonates with me, especially in developer tools.
One kind of approach – and this is kind of the nature of tools that I like to work on too – is that what you can do is pick the audience that you think has the most leverage in the organization. And for us, it’s like the data platform engineer. Like, there’s engineers whose entire job in life is to serve stakeholders who build data pipelines on top of a data platform that they build.
And a huge part of that is setting up a great developer workflow with CICD and testing, so you can actually maybe know if you’re going to break something before you push to production, which is very frequently not the case in data pipeline.
I think our early audience was really people who really got it that testing, and fast feedback loops, and developer life cycles, is like the baseline foundation of productivity. And productivity is just huge in working in the software. Because productivity is not just about doing tasks more efficiently, it’s about making an entirely new things possible.
So, yeah, I guess I kind of went for a field there, but to circle back to the beginning of the question, I think it’s audience selection and being deliberate about that, it’s really what’s important.
Mike: Recently HashiCorp has changed their license, and I see that Dagster’s published in its own GitHub repo, so you’re under the Dagster repo. Dagster is your trademark. How can you assure the community that if the board decides to sell the company to Oracle, for example, that they won’t change the license immediately? And have you considered moving the Dagster open-source project to community governance and making it safer to use for the future?
Nick: As someone who’s gone through a foundation process for another technology, we moved GraphQL to its own open-source foundation with community governance. I have a pretty deep understanding of the trade-offs here. I think it’s a question of maturity and life cycle. The risk that you said exists. There could be a boardroom coup, and I’m out and Pete’s out, and then, we’re sold to Oracle or something.
By the way, the probability of that is approximately zero, but let’s theoretically do it. And then, Oracle could change the license―that is possible. I don’t think that’s a realistic risk in any sort of near-term.
So, if we had community governance, it would eliminate that risk. However, community has a ton of it overhead. And where does the beginning of our journey for innovating, and we want to be able to move quickly and respond to feedback quickly, build features, have complete control in that way.
And that’s definitely the right trade-off for us right now. Compare and contrast that to the GraphQL story, with GraphQL, we open source the spec, a document that was meant to be very stable from day one, and evolved pretty slowly over time. So, in terms of the technical artifact there, it actually matched like having a foundation process and governance over it made a ton of sense. But for Dagster and the immediate future, we’re having more centralized control, and increased pace of execution definitely makes the most sense to us.
Mike: I’m going to move to a temporal question about 2023. A lot of tech companies struggled in 2023. The Times reported that 3,200 venture-backed tech companies went out of business in 2023. Of course, I don’t know how many normally go out of business, but still it seems like a lot. I was wondering, was 2023 a good or a bad year for Dagster? Did you buck the trend and grow 100%, or did you also feel pressures on budgets from enterprise customers?
Nick: We had a great year. So, not only did we grow 100%, we grew 400%, and our NDR was north of 150%, which means, our existing customers were also increasing their contract sizes. I feel great about the business, especially being able to grow this quickly in this environment. I am also grateful that we didn’t raise round of financing in a wildly inflated valuation, with too much capital in the FED bubble in 2021.
Because, at the time, certainly, it was frustrating – a bunch of my peers were — you know, all of a sudden, the CEO has a billion-dollar company, even though they in reality weren’t that far along in the journey.
Now, I think a lot of those people kind of are in a pretty tough spot, and they’ve had to do layoffs, and it’s painful. We kind of stuck to our fundamentals there, so, I feel very good about it.
I still think the pain is going to be very real for the industry through 2024, maybe even into ’25. Because, yes, there’s an advantage to raising a bunch of capital too, in that you have a long runway. A bunch of these companies, they have so much cash on the balance sheet, and the interest rates have gone up that their interest is actually a meaningful source of income too.
There are more waves of company death coming in ‘24 and ’25, I guess I’ll put it that way.
But we’re in a great trajectory, and I think we’ve raised an appropriate capital to the progress in the business. And we were able to raise a B in 2023, which was a very challenging process, but it felt great to be able to do that. Not many of the companies were able to do that.
Open Source R&D v. Commercial R&D
Mike: Here’s a question, and it’s a little bit about engineering priorities: you have an open-source project of which your team contributes a lot of code to, and you also have a commercial cloud product. Can you just talk sort of at a high level, from an R&D perspective, like how much of your budget gets invested into your product versus how much gets invested into the open source? And how do you balance those priorities?
Nick: It’s actually hard to tease apart. Because, if you’re an engineer who is working on a feature that will have manifestation in cloud, often you’re kind of spanning the entire stack and like working on the open source, but then also working with some proprietary features. So, it’s difficult to cleave it that way.
The other thing is that we reorganized the engineering, the R&D organization around company objectives fairly frequently. I actually can’t give you a precise number at any point, or historically/cumulatively, about how much we’ve devoted to both open source and the cloud product specifically.
I guess what I’ll say is that we still invest a ton of our eng resources. I would say like 40% of engineers effectively work exclusively on the open source, and then there’s another tranche that kind of spans the entire stack, and then there’s another tranche, like people who work on our cloud platform, and all the DevOps and SRS work around keeping that alive and operational.
I don’t know, I guess you can call 50/50, but it’s actually really difficult to put it even semi-processed number on it.
Mike: Well, it sounds like it’s really been an amazing journey. And I’d like to remind you that it really hasn’t been that long either. Only 2018 doesn’t seem that long ago to me.
Nick: Well, it seems like a long time to me, man! That’s the old joke. It’s like dog years in a start-up, one year feels like seven. I have to pinch myself. I only moved away from the CEO seat like 15 months ago or something. And it feels like a lifetime.
Mike: We covered a lot of topics, but I guess, my last question is, is there any advice you have for entrepreneurs, who are launching a business around an open-source software, product or project?
Nick: I think one of the things that founders need to think about — I mean, this could be an entire hour podcast about all the advice that I would say, but couple things to think about: one is, know when to go slow and know when to go fast, especially when you’re talking about so-called “one-way doors” in Jeff Bezos speak, where you’re making decisions that are either extremely costly or impossible to undo. Company branding is challenging to change in terms of the specifics of open source and dev tools, API decisions, especially in open source, last forever. You need to be deliberate on that.
And a commercial product, you can actually iterate extremely quickly. So, I think it actually is important to kind of have two cultural muscles. One is much more upfront design-oriented and collaborative with community, and deliberate and thoughtful on API design, but you still want to have that super-fast feedback and development when you’re developing the commercial components to your product that are hosted.
The other thing I would optimize for – if I was traveling back in time and talked to myself – is optimize for getting yourself into a situation where you can have a super-fast feedback loop, with early users and customers, where you still have the opportunity to change things, and do so quickly.
If you’re in a super-fast feedback loop with a single customer, you can make API changes much more easily. And the ideal situation still is, if you are working on a technology internally at a company, where you have access to all the code that uses it, that is just super valuable.
You’re also basically getting a seed round for free, because, often you’ll have people around you, and you’ll be working on it.
So, I don’t think I truly internalize what an advantage that was, to have it done the core R&D internal at a company. Yeah, I think like there’s a little more resistance now to open source the internal tech with kind of — it’s a less idealistic environment these days. But those are kind of the top-level things that come to mind.
Mike: Well, great. Thank you so much for taking time out of your day, Nick, and best of luck with Dagster Lab.
Nick: Thanks. It was really a joy to be on this podcast. Thanks, Mike.
Mike: Special thanks to the Dagster PR team for reaching out and helping with logistics. Cool graphics from Kamal Bhattacharjee. Music from Broke For Free, Chris Zabriskie and Lee Rosevere. Next episode recorded at the State of Open Conference. Peter Farkas, Co-founder and CEO of FerretDB. Hopefully, I’ll have that out in the next week or so. So, until then, thanks for listening.