Episode 11: data Artisans – Apache Flink Stream Processing with Kostas Tzoumas

0

Kostas Tzoumas is the Co-founder and CEO of Berlin-based data Artisans, the leading company behind Apache Flink, an open source stream processing framework that merges event-driven applications and real-time analytics. In this episode, Kostas discusses how Flink went from university research, to open source project, and finally to a commercial enterprise.

Update: In February 2019 data Artisans changed their name to Ververica.

Transcript

Intro

Michael Schwartz: Welcome to episode 11 of Open Source Underdogs, the podcast where we try to process all the incoming information about open source business models.

Kostas Tzoumas is a CEO and one of the founders of data Artisans, a company who commercialized Apache Flink. Flink enables developers to query big data in real time, as it’s being created.

I’m recording this episode in data Artisans’ Berlin office. Kostas, thank you for joining us today.

Kostas Tzoumas: I’m happy to be here.

Michael Schwartz: Kostas, tell us about yourself – what’s your background before you got involved with data Artisans?

Kostas Tzoumas: So my background is in computer science. I started in computer science, I did a PhD in database systems, and then I moved here in Berlin to do research on big data. And this is actually where the whole idea of Apache Flink and data Artisans as a company came through, through the research work doing at the university here in Berlin.

What Is Stateful Stream Processing

Michael Schwartz: Keeping in mind that our audience is general interest/business, could you tell us a little bit about what Stateful Stream Processing means and specifically what is Apache Flink?

Kostas Tzoumas: I would say Stateful Stream Processing is a fancy technical word for something super simple. And the whole premise is the following: If you think of data, the way that data is usually generated is continuously.

So think of a clickstream. A clickstream is basically capturing the clicks, and the things that we do on websites or apps or whatever. Or think of sensors in the natural world, measuring something and awaiting data. Or think of financial transactions.

The way that data is born in the outset is always one event at a time. It is a continuous stream of events – that is what streaming means.

And the main idea behind stream processing and Stateful Stream Processing is to embrace this fact, and instead of trying to land the data first to some place, and then process them, and then do analytics on them, it is to do the analytics while the data is flowing. That is basically Stateful Stream Processing.

“Stateful” means that the computation itself is my worry, so stateful is really just a word for doing more than something trivial if you will, so analytics on the data, building applications on these streams of data that do, for example, things like fraud detection. For example, we are working with banks that are getting a continuous stream of credit card transactions, and they’re trying to classify whether the transaction is real or fraudulent.

The state in this case refers to the model. It can be a machine-learning model that determines whether a transaction is false or not. It can do things like real-time personalization, real-time recommendations.

So, for example, as people are buying things in any commercial, for as people are looking up things, hovering over something then you can continuously adjust the model that recommends new things to your customers.

One example is that we have been working together with Alibaba for quite a while, and they have built the service and recommender engine, based on this concept of using the real-time information for personalization and recommendations.

Other relevant work there is what Netflix has done with Apache Flink, taking into account what we are doing real-time on Netflix to personalize the website and recommend movies and series to you.

What Is Flink?

Michael Schwartz: Tell me a little bit about Apache Flink.

Kostas Tzoumas: It has a very interesting history.

I think the first line of call was written by my co-founder, maybe 9, 10 years ago, it started basically as a platform, that was when the whole idea of big data and distributed data processors, like Hadoop, was starting out. And the idea was to do that in real time, so massively distributed data brochures that came out were very batch processing-oriented, meaning that you would get a lot of data that you would process, that you would wait for a few hours, and then you would get an answer.

And the idea was to do that continuously and do that in real time.

As I said, it started out as a research project here at the Technical University of Berlin, then the team open sourced it so we donated it to the Apache software foundation.

That is when we came up with the name Apache Flink. So “flink” is a German word for agile or fast. And then a very fitting logo for this was a squirrel, which is very popular in Germany, it’s an animal that can move very fast.

And from then on it has been basically several years of the project, growing massively in an open source community, so it is by now one of the biggest projects in big data because it is being used in production by most of the tech companies in the world and also enterprise.

Origin Of data Artisans

Michael Schwartz: How did data Artisans get started?

Kostas Tzoumas: That I would say follows really the Flink history.

So the team that built the project at the university, at some point we decided that the best way to accomplish the mission that we were having, which was to really bring the technology out in the world and have everybody use it – this was best accomplished by starting a company rather by doing academic research.

So when we started data Artisans about 4 years ago, we had a very simple goal, which was to make it possible for everyone to use stream processing with Apache Flink.

How To Go From Project To Funding?

Michael Schwartz: It seems like the funding and the company almost got started at the same time – how did that happen?

Kostas Tzoumas: I would say that clearly, from the beginning, we were always positioned as a company based on the open source, so what we had was a great technology, a great open source community.

This is a model that has been proven to some extent by some companies, and to some other extent it is still being proven. There are a lot of companies that follow this model, so they have an open source foundation and build a business based on that.

This has always been our premise. We were always looking for investors that will subscribe to this sort of long-term.

Open Source vs. Enterprise Features

Michael Schwartz: How do you decide which features go into Apache Flink and which features become data Artisans features?

Kostas Tzoumas: With data Artisans we have always been following a very long-term approach.

In the first few years of the company, we were basically doing everything in open source, 100% in open source. Monetization was not the focus, the focus of the company was to grow the open source community, to help the open source community, we were doing a lot of work together with companies that have used Flink, with the goal of adoption. At some point, this worked out for us.

At this point, Flink is the engine of choice for every company that wants to do any kind of serious stream processing. It was the point that, also with data Artisans, we decided to move on to the next step, which is to build a business and monetize the business.

The way we did that is what people are referring to as the open core model, so we have a product, it’s called the data Artisans Platform. It encapsulates Apache Flink, and also adds additional features on top of that, and we offer that together with enterprise support.

There have been two features that we have added on top of Apache Flink that are contained in our product. The first one we released over a year ago, and that is a way to basically manage the deployment of stream processing applications and production. It makes it easy for enterprises to operationalize Apache Flink.

The idea is that we should be able to experiment quite a bit with open source. But then, if you’re an enterprise and needs additional help and additional tools to bring it into production, to connect it with all the other infrastructure. You have like login, matrix, etc, and to really build an internal platform to use – that is where our offer kicks in.

Then the second feature is part of data Artisans platform we released in September. It’s called the data Artisans Streaming Ledger, and it is effectively a way to do ACID transactions on streaming data.

To your question, how do we decide whether something is open source or closed source – I think it’s a fine balance.

We are always taking the long view. We are not maintaining an internal version of Flink, we’re shipping standard Apache Flink.

At the same time we appreciate – and I think by now open source communities appreciate as well – that a healthy business behind the open source framework is something that is also good for the development of the project.

Really the value proposition is that you get a turnkey solution, you don’t have to build stuff around the edges yourself but you get a turnkey solution that has been built, by the creators of Flink. It has been tested by us, it’s been supported by us. And we can basically help you on your journey to adopt, productionize, and use stable Apache Flink production.

Licensing

Michael Schwartz: So how do you license that? Is there a different license for let’s say the non-core features?

Kostas Tzoumas: Apache Flink is Apache license, obviously. Our product is as a whole closer, it comes with a license, and we offer that in basically two editions.

In one we call the Stream Edition, which contains Apache Flink, application manager. And if you have components, then the other one also contains a streaming ledger, the ability to do ACID transactions. We call that the River Edition.

Other Revenue Streams

Michael Schwartz: Are there any other revenue streams like services or trainings?

Kostas Tzoumas: Yes, we do offer services to customers, and we also have training for Apache Flink, which we do both in the form of on-site training to customers, and also recently we organize public trainings in several cities.

Michael Schwartz: But in terms of a percentage of revenue on licenses – is it more than 90%?

Kostas Tzoumas: We are pretty focused on licenses as a company. I think we better wait to make a revenue, selling a product rather than selling time – it scales much better.

Customer Segments

Michael Schwartz: So, who are the current customers – do you segment them in any way, for example by sector or by functional application?

Kostas Tzoumas: Product is very horizontal. Obviously, the open source community is completely global, and I would not say it’s segmented in any way, so we have use cases from all industries.

As a business, we see a lot of early commercial adoption in the financial services sector – so investment banking, insurance companies, and so on.

That’s not to say that these are the only customers that we serve, but we do see a good amount of commercial adoption.

Contributions From Large Companies

Michael Schwartz: Are those larger cloud companies contributing? Are they buying support, or are they contributing in other ways to the project or to data Artisans?

Kostas Tzoumas: Netflix is a user of the open source framework. Same goes for companies like Uber, Lyft, so the larger, new tech companies.

So in tech companies, they usually have a different mentality, to make sense for them would be to hire talent and build a lot of things internally. But they do contribute in other very, very meaningful ways to the open source communities.

For example, specifically with Netflix, built largely on AWS, and they have tested Flink on AWS, on a scale that pretty much nobody else has. So they really made Flink extremely stable at that scale with AWS, which is a benefit to every other potential user or data Artisans customer that wants to run on AWS.

Concerns About SaaSquatters

Michael Schwartz: You might have been following some of the license changes by other open source companies to prevent monetization of the open source by large cloud providers in a way where they’re not contributing their fair share to the open source.

Are you concerned that an Amazon or Google might take Apache Flink and build the features that you consider enterprise and compete with you?

Kostas Tzoumas: I mean, this is definitely something that might happen, as Flink is getting more and more popular.

Amazon and Google already have offerings around Apache Flink, in their free services, on Amazon EMR.

Does this concern me – yes, it concerns me.

I would very much like to be in a world where all of these players would play specifically and contribute back to the open source community.

However, I take an optimistic approach to this, so this also proves the popularity of the platform. We are in a market that is growing every day.

We see completely new applications coming in every day. I’m less concerned right now about how we’re going to slice the pie, but how everyone in the industry is going to work together to bring the message out, the message about streaming data, and to make the pie even larger.

It’s just the market is growing very, very fast.

What’s Next?

Michael Schwartz: Where do you think you need to go from here? What new products and services are in the planning phases?

Kostas Tzoumas: I think we are just in the beginning.

We see stream processing as a new paradigm for data computation. It just changes the way that you look at data.

Instead of landing the data somewhere, and then finding a query, and then getting an answer, you have a query that is running always. And as this data is coming in, you always have the most updated answer. And we just see more and more and more applications coming in every day.

When we were starting out, the main use case for streaming data was to do real-time analytics, and usually to kind of do an approximate analytics in real time, while you wait for the correct answer to come up later.

Then we saw a lot of applications that we would classify more as applications rather than as analytics. Things like fraud systems as I mentioned before, or billing systems, and things like that.

So actually running the business – not measuring what is going on – with the new functionality that we brought in our product, the streaming ledger, we are now able to cover use cases that before could only be covered with relational database systems; use case and applications that need basic transactions.

We see a lot of movement in a Flink community, trying to bridge the gap between the batch processing and stream processing.

Batch processing is really a subset of stream processing. Batch computation is a stream that just so happens to have a beginning and an end – so you can do that as well.

So we’re just seeing new applications coming in every day. And the way we are approaching this is, we are trying to create the tools to make this accessible to the Enterprise.

So every company that, let’s say, does not have the ability that, sort of Netflix or Uber have – to hire dedicated developers and build everything on their own – we have positioned ourselves as being the vendor that can enable them to be at the same place that Uber, Netflix, and Alibaba is, but without having to roll their own system.

So we’re learning from the community of our products.

Range Of Customer Interactions

Michael Schwartz: What’s the range of different customer relationships you have? So some customers maybe give a very direct support relationship, some customers maybe they never call you – so what’s sort of the range of customer interactions that you have?

Kostas Tzoumas: Right now, in the state that we are in our business, we strive to have a direct relationship with every customer.

So everything we do is direct sales, or most that we do is direct. And we strive to have a very direct and a very deep relationship with a customer, working together with our engineering teams, like really being a partner for them.

Sales Process

Michael Schwartz: How do customers find you?

Kostas Tzoumas:The open source platform is famous, that’s where people find us. It’s all inbound, we don’t cold call anyone.

And from that point on, for us, it is a typical Enterprise sales, because we work with very large enterprises, large deployments, typical Enterprise sales process.

Pricing

Michael Schwartz: Pricing is really a challenge for everybody, every industry. Can you talk a little bit about what were some of the challenges around pricing?

Kostas Tzoumas: As you say, it’s hard.

First of all, I think pricing works when it captures utility, so the customer must be happy to pay more as they get a lot more value out of it. I think this is one parameter. And the other parameter is keeping it simple, because nobody wants to solve a differential equation to figure out the price.

Balancing this too is always I think an arc, and I’m not going to say that we have done something completely different there.

Michael Schwartz: Have you had to tweak the offering a little bit since you get started as a being trial and error?

Kostas Tzoumas: Always, yes, we had to do that.

I think there’s no other way, you have to communicate with the market, see what the customer wants, see what resonates with a customer and then adopt.

Michael Schwartz: Isn’t that tricky in Enterprise sales, where there’s a longer sales process, and it’s not like you can do A/B testing, and sell your data Artisans platform for one price, and then sell it for another price?

What’s the time frame for you making those types of pivots?

Kostas Tzoumas: It is, and I don’t want to say we ended up doing any kind of pivot.

Our initial thesis there was well-received, we had to do minor tweaks. We also know the way that this kind of Enterprise customer products or price in the market is pretty uniform. Most of the companies show very similar broad concepts, so there’s also that experience to tap into.

It’s also valuable because from a procurement department, what they want to see is something they have seen before and understand and give a processing rather than fancy a new idea that someone thought of.

Partners

Michael Schwartz: Can you talk a little bit about channels? I know that data Artisans is still pretty new, but have channel partners played a role, and where do you see the partner program going?

Kostas Tzoumas: I’m a big fan of being focused. For me, building a company has a lot to do about saying a lot of “No’s,” and making a few bets.

When we started out the company, we focused 100% on open source.

At the time, that was an unpopular choice for many who were telling me, “You have to build a business, it’s not about the open source.”

If you try to do too many things at the same time, you end up being kind of medium good at many things, and nobody wants that.

So, for me, the first challenge was to achieve critical mass in the open source community.

The next sort of journey we are going on now is to prove that data Artisans is a business, and then after that comes scaling.

Right now, we’re working with a number of partners, but in an exploratory way, and when it makes sense to get a direct customer.

Pretty soon, we will start making a more formal partnering program, but we haven’t done this yet.

Competition

Michael Schwartz: How do you distinguish your offering from the competitor’s?

Kostas Tzoumas: What is core to us is that we are basically the people that created Apache Flink, originally internally in the company we are the experts on the framework.

We are the ones who can support the customer best. And by participating in the open source community and by working together with all of these large tech companies that are using Flink, even if there’s another customer, we are learning so much, and we’re bringing all of this institutional knowledge to our customer. We are really the absolute experts on Apache Flink.

There are of course other alternative frameworks from Apache Flink, and they are distinguished by, if you really look at the market today, Flink is ahead of its competitors in stream processing for a number of years in development. Both in terms of feature shared and the functionality of the new framework, but also in terms of to how much extent has the framework had been tested and hardened at a very large scale.

Ten-Year Vision

Michael Schwartz: Where do you see data Artisans in 10 years?

Kostas Tzoumas: The market is growing very, very fast. This is a new paradigm for data.

Companies are realizing the benefits that embracing this paradigm can give them, but I think we are really in the early days of the market.

Challenges For Open Source Startups

Michael Schwartz: So you’ve had quite the experience building an open source community and an open source company – what do you think are the challenges facing new founders of open source companies?

Kostas Tzoumas: A lot of companies, before we started Artisans, had done a lot of the groundwork – for example, the Hadoop companies Cloudera, and so on. They had proven to the market, they had convinced the market, that open source is a good thing.

For example, we in data Artisans have never had to sell and to convince a bank that open source is a bad thing – that’s a huge step.

I also think that developer communities are also becoming a lot more business savvy. Developer communities are also realizing that having a financially successful business, backing up the open source projects, just means better results for the open source projects.

I think for us, data Artisans, and for new founders, that a lot of work has been done – but it doesn’t mean that this has been proven.

There are very few companies in the world that have successfully IPO’d, and have been public for a while, or have been financially successful with open source. But a lot of this road, I think, has been paved.

I think one challenge for the new founders is the licensing challenge that you mentioned before.

Basically, how can you avoid sort of parasitic behavior in the community. So, companies just grabbing the open source, offering it commercially, but not contributing anything back – I think this is an active discussion right now in the open source community.

I’m interested to see how it goes from an academic point of view more, because for us, we are part of the Apache Foundation – Flink is Apache licensed, and it will stay like that. But the movement in this space for me is interesting.

Advice On VC Process

Michael Schwartz: Any tips about the VC process and working with VCs might be helpful to other entrepreneurs?

Kostas Tzoumas: Yeah, I think this is a challenge in every company. I don’t think this is particular for open source companies.

I’m not going to pretend that I’m the one who is in a position to give advice at this point. But even if it kind of sounds like a platitude, what I have seen is that the person matters a lot, so the particular partner that you work with is paramount – and perhaps even more important than the firm.

Closing Advice For Entrepreneurs

Michael Schwartz: So one last question – do you have any personal advice for the entrepreneurial person who is about to start a company?

Kostas Tzoumas: It’s hard. I think first of all what I have seen is that the motivation should not be the money.

Money has not made any successful entrepreneur who was motivated by money, and to do all the hard work, and to stay up all night, and to go through all this stress, you really need to believe in your mission and refine your mission.

And then the other thing I learned from others that came before me and mentored me was exactly the importance of being focused.

So when you’re starting a company, and you’re building a company, it’s not obvious what you have to do. Especially for me, that’s my first time doing that.

So it’s never obvious what you have to do. And, you know, you see all of these things and you always think – hey, I should be doing a little bit of that. I should go to that conference, and that conference also looks good. Or, maybe we should be building a product now and targeting that market, because oh, there’s another community coming in that’s targeting that market.

So, you can very easily get lost.

What I think has proven very, very useful for me is the importance of focus. So the definition of a startup is limiting resources and maximizing the impact of these resources.

So the question should not be, what is the possible spectrum of things that we can do, but whether there’s 1, 2, 3 maximum things that we can do to maximize our impact in this world, without resources right now.

So focus on one thing and become extremely good at this. This is the advice that I have gotten, and it’s served me very well.

Michael Schwartz: Kostas, thanks so much for your candorous responses, and thank you for being on the show.

Kostas Tzoumas: Thank you so much.

Michael Schwartz: Special thanks to the data Artisans team for reaching out to us. Transcription and episode audio can be found on opensourceunderdogs.com.

Music from Broke For Free by Chris Zabriskie and Lee Rosevere.

Production assistance and transcription by Natalie Lowe. Operational Support from William Lowe.

Follow us on Twitter, our handle is @fosspodcast.

We have a packed schedule for 2019. Next time we’ll be talking with Sean Porter, Founder of Sensu – he has some great insights, so don’t miss it.

Until then thanks for listening.

Popular Episodes

Subscribe to our newsletter
for news and updates