We tried to build an end-to-end ML platform. Here’s why it failed.

And how failing confirmed that we were right—sort of

In early 2019, a couple of us tried building an end-to-end ML framework. Our basic insight was that building ML pipelines is a frustrating, disjointed experience, and that we could probably build something better.

It didn’t go as planned.

I’ll go into more detail, but the high-level is:

We wrote abstractions for the different stages of ML pipelines—data ingestion, training, deployment, etc—using Kaggle datasets for testing.

We open sourced the repo and shared it. In a month, we hit the front page of HN. Everyone liked the idea of improving machine learning’s UX.

After six months, we had a few hundred GitHub stars, and roughly zero people using it. We swallowed our pride and deleted 90% of the codebase.

Going through all of this led us to build a better project—Cortex, our model serving infrastructure—but for anyone interested in machine learning and/or ML tooling, let this be a cautionary tale:

Production machine learning does need better UX, but the ML ecosystem is complex and fluid—and that makes building a tool that covers a meaningful number of use cases really hard.

Why we wanted an end-to-end ML framework

Most of us (Cortex contributors) have backgrounds in devops and web dev. We’re accustomed to frameworks that abstract the different layers of an application into a single interface.

When we got into machine learning, each of us was struck by how disjointed the tooling was. We wanted to build recommendation engines and chatbots (or rather, conversational agents), but in doing so, we found ourselves jumping between different environments — Jupyter notebooks, terminals, the AWS console, etc. — writing entire folders of glue code and TensorFlow boilerplate to duct tape together what could generously be called a “pipeline.”

If we could replace all that hacking and gluing with a config file and command like:

$ deploy recommendation_engine

That seemed like an obviously good idea.

So that’s what we did. We built a tool that used Python to transform data, YAML to structure the pipeline, and a CLI to control everything:

And it made for a great tool—when you fed it a Kaggle dataset, using the narrow stack we supported, with the limitations we placed on APIs.

Basically, if you tried using it in the real world, chances are it wouldn’t work with your stack. That, unsurprisingly, was an issue. And while some of it comes down to our design, a large part of the problem is actually intrinsic to building an end-to-end tool—we just didn’t figure that out until after we’d built it.

The problem with end-to-end ML frameworks

The simple version is that the production machine learning ecosystem is too young for an end-to-end framework to be both opinionated and right.

We weren’t wrong that ML engineers wanted tooling with better UX, but we were wrong that we could build an end-to-end framework that covered a plurality of use cases (especially with just a few contributors).

Something helpful to do—which we neglected to do earlier—is to think of the web frameworks that inspired us, and remember when they first rose to prominence.

Rails, Django, and Symfony were all released between 2004 and 2005 as part of a wave of new MVC frameworks for the web. While web development at that time may not have been “stable,” especially considering how it has matured since (thanks in no small part to those frameworks), there was still a high degree of similarity in the work being done by web developers. In fact, one of Rails earliest mottos was “You’re not a beautiful and unique snowflake,” in reference to the fact that most web developers were building architecturally similar apps that could run on identical configurations.

Production machine learning is not in that phase yet. Everything is still in flux. Data scientists vary in the types of data they process, the model architectures they use, the languages/frameworks they prefer, the inferencing needs of their application, and in just about every other way you can imagine.

Moreover, the field itself changes rapidly. Since Cortex’s initial release 18 months ago:

PyTorch has gone from a promising project to the most popular ML framework, while many specialized training libraries (like Microsoft’s DeepSpeed) have been released.

OpenAI released the largest model ever, the 1.5 billion parameter GPT-2. Google, Salesforce, Microsoft, and Nvidia have all since released larger models (some by an order of magnitude).

Large numbers of startups have begun to use transfer learning and pre-trained models to fine tune and deploy models with small amounts of data (i.e. not everyone needs a 100-node Spark cluster now).

With all of this in flux, trying to build an end-to-end framework that supported the “correct” stack was doomed from the start.

Everyone would ask for the “one feature” they needed, and no one had the same request. We tried building escape hatches for a bit, but it quickly became clear that “Django for ML” wasn’t feasible, at least not in the way that we’d imagined.

Focusing on model serving infrastructure

End-to-end was hard because most of the ML ecosystem is still the wild west. There was one area, however, where there was stability and consistency: model serving.

Regardless of what stack they used, most teams were putting models into production by wrapping them in an API and deploying to the cloud—and they didn’t like doing it.

Data scientists disliked it because the tools for building a scalable web service—Docker, Kubernetes, EC2/GCE, load balancers, etc.—were outside their wheelhouses. DevOps engineers were annoyed by the peculiarities of model inference.

For us, this was an opportunity. The model-as-a-microservice design pattern was consistent across teams, and the tooling—because it was part of the infrastructure, not ML, ecosystem—was very stable. Better yet, as software engineers, we were more experienced with building production web services than we were with ML pipelines anyway.

So, we thought we’d give model serving a shot. We applied the same design principles, abstracting all lower-level wrangling behind declarative YAML configurations and a minimal CLI, and automated the process of turning a trained model into a scalable, production web service:

By focusing exclusively on model serving, we can be agnostic towards the rest of the stack (as long as the model has Python bindings, Cortex can serve it). Because Cortex can plug into any stack, we get to be opinionated about what tools Cortex uses under the hood, which in turns makes it easier to build higher-level features.

For example, since releasing Cortex for model serving, we’ve added support for GPU inferencing, request-based autoscaling, rolling updates, and prediction monitoring. We don’t need to implement these features for a dozen different container runtimes and cluster orchestrators. Cortex uses Docker and Kubernetes under the hood, and the user never has to touch either.

So far, the approach seems to be working:

Applying lessons from web dev to ML tooling

Philosophically, web frameworks have a big influence on how we think about Cortex.

Frameworks like Rails and Django put a premium on programmer productivity and happiness. To build a web app, you didn’t have to worry about configuring a SQL database, implementing request routing, or writing your own SMTP methods to send emails. All of that was abstracted away behind intuitive, simple interfaces.

In a nutshell, that’s how we think of Cortex. Data scientists shouldn’t have to learn Kubernetes, they should get to focus on data science. Engineers shouldn’t have to spend days figuring out how to keep a 5 GB model from blowing up their AWS bill, they should be free to build software.

Hopefully, as the ML ecosystem matures and stabilizes, we’ll be able to extend this philosophy to the rest of the stack. For now, model serving is a good place to start.