How to stand out in open source

Do you have a sense of what makes one technology stand out against another in a crowded space?

I think it’s a combination of things. You have to be sufficiently different than other things, and people need to find advantage and value in using it. You need sufficiently high-quality technology that really is usable. Often times, there’s also a substantial institution behind a project to get it to that point.

In any one area, whoever gets there first with something that’s high-enough quality and is substantially different than other things can take over. Then, they’ve got to follow through with having people who are capable of building a productive community to keep it moving and keep it ahead of competitors.

Spark is a great example of something that attracted a lot of contributors and kept them, so it’s been growing. It hasn’t alienated people, and that has been a real key to its success.

The so-called Big Data Analytics Stack. Credit: University of California, Berkeley

Spark has some competitors, but they all seem to be a little behind with level of quality, level of completeness and level of contribution. Maybe they’ll catch up. Some of them claim to have better architectures, but I’m a little skeptical that they’ve got enough of a fundamental improvement that they can surpass all the other things.

We can see a similar story going back to Hadoop. There could have been other things out there, but we were early enough, and with help from Yahoo we got it to be good enough quality that it was out there and people figured they didn’t need to recreate it. They could just pile on and use it.

“It has been amazing to me the degree to which people really follow what’s going on in open source and experiment with the bleeding-edge stuff at pretty stodgy institutions.”

Has the pace of open source picked up markedly in the past several years? Spark is by no means a mainstream technology, generally speaking, and yet we’re already talking about “next-generation” competitors.

You’re right. It does move quickly, and there are people who are very speculative. But I think we’ve seen enough interest in Spark that it’s been anointed — not just by Cloudera, but by IBM and lots of folks.

People in industry are really, seriously investing in Spark. It has been amazing to me the degree to which people really follow what’s going on in open source and experiment with the bleeding-edge stuff at pretty stodgy institutions. Even in banks, the people in the tech departments are really on top of all this. They follow all the open source politics and they follow all the latest technologies, and they’re experimenting with them.

Something I learned many years ago in Lucene was that we had a mailing list and there was a community of people I knew who were involved in using Lucene. Yet, I’d go to a conference and I’d run into hundreds of people I’d never heard of who said they built their applications around Lucene. It surprised me a lot at first — it was at least a 10-to-1, and maybe even 100-to-1, ratio of the actual set that was using Lucene and people who I knew were using it.

“[T]here’s a big community out there of people who aren’t talking about [Spark], who’ve already settled on it as their next-generation technology.”

Think about your own software use: You download an app and use it, and you don’t contact the developers. You don’t add a review. You just use it because it works.

With open source, we don’t even have download stats, really. Cloudera has a little bit of a handle on that, but Apache doesn’t, really. And people, of course, can copy things around and get them through different channels. I think there’s a lot more use out there than we’re aware of.

So with Spark, it’s at a young phase and there are only so many public endorsements of people using it in production and so on. But there’s a big community out there of people who aren’t talking about it, who’ve already settled on it as their next-generation technology. I run into this everywhere.

You run into it around the world. I was recently in Japan and in Budapest two weeks before that. Everybody I spoke to was like, “Yeah, we’re really gearing up with Spark. We know that’s where things are going and we’re on with it.” Maybe that’s just because of the people I run into at the conferences I attend, but these conferences aren’t tiny anymore. It’s a pretty substantial swathe of IT folks.

A packed hall at the recent Spark Summit Europe. Credit: Databricks

Even if competitive projects can claim architectural advantages, isn’t the sheer momentum from large user communities — and large committed users — too much to overcome at some point?

It’s a lot harder to supplant something when you’ve got people who already know the technology. You have to be a substantial leap ahead of that in order to replace it. Things are going to continue to change.

I see a lot of companies out there that are founded around single technologies, claiming “We’re the X company.” I worry about those. I think there’s going to continue to be turnover, and the lifespan of a lot of these technologies is going to be short. Five years is going to be really the time, their golden age, before something might come along and replace them.

You hope a company’s going to last longer than five years. You hope that people are not all forming companies that are going be acquisition targets. I’m really pleased that Cloudera has found a way around becoming just “the Hadoop company,” but rather (and pardon the buzzwords) saying, “We’re the open-source big-data ecosystem platform company, or the Enterprise Data Hub company.” I think that’s a smarter play.

“I see a lot of companies out there that are founded around single technologies, claiming “We’re the X company.” I worry about those. I think … the lifespan of a lot of these technologies is going to be short.”

When I talk to customers, it becomes clear that users — not vendors — are the people who choose what they’ll use. We have this stack of tools that we support. We encourage people, “You only want to use supported tools.” We’re the vendor. We can tell you what we support and you ought to just pick from our menu. I’d say nearly all of our customers, it feels like, use one or two things we don’t support.

They all have a slightly different mix of things they pick. Maybe 75 or 80 percent of their stack overlaps with the one that we support, and they pick a few other things that they decided, “For us, this is really the best tool.” They’re not just picking the one thing and saying, “This is it for us.”

And if they’re picking a stack of six components, I don’t think they want to work with six different vendors. They’ll work with one, maybe, and they’ll take a risk on a couple other things and not have support for them. That seems to be the pattern I’m seeing. So for businesses, I think adopting a stack is a better story, and then you can evolve that stack.

“It’s harder to make a fortune in open source. The more you pick a niche within that, the harder the road you chart for yourself.”

WIth a very focused business model, it seems like you also have to give yourself a lot of runway. Even a few million dollars in funding doesn’t necessarily buy you a lot of time.

The size of the market and the margins are not what they were in the heyday of Oracle and Microsoft. One of the attractions of open source is they’re less expensive technologies and that you’re not necessarily dependent on the vendor. It’s harder to make a fortune in open source. The more you pick a niche within that, the harder the road you chart for yourself.