Machine learning is still too hard to use

But things are starting to get easier

Disclaimer: The following is based on my observations of machine learning teams — not an academic survey of the industry. For context, I’m a contributor to Cortex, an open source platform for deploying models in production.

Take something ubiquitous in software, like a database. What does it mean to build one? To a Postgres contributor, “creating a database” looks like a million lines of C. To a Rails dev, it looks like rake db:create .

Obviously, neither are wrong, they just represent different levels of abstraction, appropriate for the different focuses of each engineer.

This is how software builds on itself. The basic software that powers modern applications—databases, web servers, request routers, hashing libraries, etc.—become widespread in no small part due to the layers of abstraction that make them accessible to non-specialists.

Machine learning has historically lacked that layer of abstraction, limiting its adoption. Now, however, things are changing. There is a new wave of projects focused specifically making applied machine learning easier.

Models need a developer-friendly interface

In order to use machine learning in production, you need:

The expertise to design your model

Enough data and funding to train your model

The ML infrastructure knowledge to deploy your model

Any project using ML, as a result, needs to be staffed by several specialists. This is a bottleneck that has to be removed.

It should to be possible for a developer with little background in machine learning to use it in production, just as a developer with no background in cryptography can still apply hashing libraries to secure user data.

Fortunately, this is finally happening.

Bridging the machine learning abstraction gap

In order for applied ML to become widespread, a developer must be able to take a high level understanding of machine learning—what is a model, fine tuning, inference, etc.—and using available abstractions, build an app.

Many of the necessary abstractions are already being worked on, and they fall into a few key areas of focus:

1. There needs to be an easier way to train models

The reality is that for many of applied machine learning’s use cases, there is no need to train a new model from scratch.

For example, if you are developing a conversational agent, Google’s Meena is almost certainly going to outperform your model. If you’re developing a text generator, you should use OpenAI’s GPT-2 instead of building your own from scratch. For object detection, a model like YOLOv3 is probably your best bet.

Thanks to transfer learning—a process in which the “knowledge” of a neural network is fine tuned to a new domain—you can take a relatively small amount of data and fine tune these open source, state-of-the-art models to your task.

For example, with new libraries like gpt-2-simple , you can fine tune GPT-2 using a simple command line interface:

$ gpt_2_simple finetune your_custom_data.txt

With this layer of abstraction, developers don’t need deep ML expertise—they just need to know what fine tuning is.

And gpt-2-simple is far from the only training abstraction available. Google’s Cloud AutoML gives users a GUI that allows them to select their dataset and automatically train a new model, no code necessary:

Writing about AutoML, Sundar Pichai said “We hope AutoML will take an ability that a few PhDs have today and will make it possible in three to five years for hundreds of thousands of developers to design new neural nets for their particular needs.”

2. Generating predictions from models needs to be simple

Okay, so it’s easier to get a trained model for your particular task. How do you generate predictions from that model?

There are a ton of projects which offer model serving functionality, many of which are connected to popular ML frameworks. TensorFlow, for example, has TF Serving, and ONNX has ONNX Runtime.

Outside of the tech giants, there are also a number of independent open source projects working on this problem. For example, Bert Extractive Summarizer is a project that makes it easy to extract summaries of text using Google’s BERT. Below is an example from the docs:

from summarizer import Summarizer



body = 'Text body that you want to summarize with BERT'

body2 = 'Something else you want to summarize with BERT'

model = Summarizer()

model(body)

model(body2)

Generating a prediction with the library is as simple as an import statement and a call to Summarizer() .

As more projects like these continue to launch and develop, it becomes easier for developers to generate predictions from models without having to dig into the model itself.

3. Deploying models needs to be simple

The final bottleneck is infrastructure.

Serving predictions for a toy application is straightforward, but when your application needs to scale, things get difficult. Using GPT-2 as an example:

GPT-2 is > 5 GB . You need a larger—and by definition, more expensive—server to host this big of a model.

. You need a larger—and by definition, more expensive—server to host this big of a model. GPT-2 is compute hungry . In order to serve a single prediction, GPT-2 can occupy a CPU at 100% utilization for several minutes. Even with a GPU, a single prediction can still take seconds. Compare this to a web app, which can serve hundreds of concurrent users with one CPU.

. In order to serve a single prediction, GPT-2 can occupy a CPU at 100% utilization for several minutes. Even with a GPU, a single prediction can still take seconds. Compare this to a web app, which can serve hundreds of concurrent users with one CPU. GPT-2 is memory hungry. Beyond its considerable disk space and compute requirements, GPT-2 also needs large amounts of memory to run without crashing.

In order to handle even a small surge in users, your infrastructure would need to scale up many replicas of your application. This means containerizing your model with Docker, orchestrating your containers with Kubernetes, and configuring your autoscaling with whatever cloud platform you use.

Building the infrastructure to handle machine learning deployments requires learning an entire stack of tools, many of which will not be familiar to most developers who don’t have devops backgrounds:

Machine learning infrastructure stack

In order for machine learning to become accessible to developers, machine learning infrastructure needs to also be abstracted. This is where projects like Cortex (full disclosure: I’m a contributor) come in.

Cortex abstracts away the underlying devops of model deployment with a config file and a CLI:

The goal of projects like Cortex is simple: Take a trained model, and turn it into a prediction API that any developer can use.

Making applied machine learning easier

Let me be clear, the underlying math behind machine learning will always be hard. No one is a machine learning expert just because they can call a predict() function. The point is that a developer shouldn’t have to be a machine learning expert (or devops expert, for that matter) to use ML in their application.

The machine learning ecosystem is finally focusing on making applied ML easier. A developer with just a little knowledge can fine tune a state-of-the-art model, wrap it in an API, and deploy it on scalable infrastructure using open source, intuitive abstractions.

As a result, applied machine learning is about to become easier—and by extension, accessible to virtually all developers.