Data science and machine learning can improve almost any aspect of an organization, but only if your ideas get used. Over the last year we’ve learned a lot about building and deploying machine learning models faster and we wanted to share some of the things we’ve learned here.

“cheetah running on brown field” by Cara Fuller on Unsplash

The Situation

In our organization, we needed to generate a return on our analytics investment as quickly as possible. We needed to deploy machine learning models to production faster. Most importantly, we didn’t want great ideas to become analytic shelf-ware, sitting around waiting to be used.

Traditionally, we were building each data product as a bespoke solution. There was little reuse between each custom solution. What we needed was an assembly line for data products.

So, we built an assembly line for building, testing, and deploying data products, which we called the machine learning platform. With it, we could now deploy a model to production in minutes. We no longer had to wait as long to enjoy a return on our analytics investments.

What we learned along the way

Along the way we learned some important rules about how to build, test, and deploy machine learning models safely and quickly. These rules changed how we work, hopefully you’ll find them useful for you and your organization.

Embrace Self-Service

Before our machine learning platform existed, the models created by data scientists would be handed off to IT so they could create data pipelines and a model deployment environment for each model. Some models were even rewritten into a different language before they were deployed.

We built our machine learning platform to provide model builders the ability to self-deploy models approved through internal model governance processes. Self-service is key to going faster.

2. Use Containers to provide abstraction from the infrastructure

Containers provide a great way to isolate and version models. If your organization uses a standard server load you might find it difficult to get your dependencies and artifacts installed on that standard server. Containers fix that. It might be that getting packages installed by the server administrators takes too long. Containers fix that too. You might need to host a new version and legacy version of the same model for some time, each needing a different set of dependencies. Containers help with versioning strategy as well.

Your enterprise might be all cloud, some cloud, or no cloud but even if you aren’t using cloud currently, you’re probably thinking about it. Containers are very portable. If you take a container-based approach, you can run those models anywhere on premise or on the cloud.

3. Data Scientists need to care about code quality.

Giving your data scientists the power to self-service deploy models to production comes with the responsibility of writing production quality code.

This might mean your model building team(s) have to up their software engineering game. Knowing a little Python syntax and calling an API doesn’t make you a good software engineer. Software quality matters at least as much as data and model quality when you are building software that an organization will use in production systems.

This might mean adopting practices like test driven development and code review. It might mean trying out pair programming. While you’re at it, you should consider carefully how and when you use notebooks and the impact that has on software quality. Luckily, these patterns are well known in the software engineering world and very adoptable for most groups.

4. If it isn’t automated, it isn’t done.

Platform speed and stability both depend on automating your model deployment platform and process. If you want to go faster, be uncompromising in your adoption of automation.

On our machine learning platform, we’ve automated the entire model lifecycle. Continuous integration and continuous delivery drive model testing and model deployment on the platform.

We’ve also automated the configuration and deployment of the underlying platform infrastructure. In doing so, our team learned to treat these automated virtual machines as disposable resources. No one logs into a server for administration and all administrative tasks are automated. This means every server is consistently configured regardless of how much we scale out. Rather than upgrade a server, we redeploy the platform with new infrastructure automatically.

5. Build a platform that supports the entire model lifecycle.

So far, I’ve focused on building, training, and deployment, but that’s only the first part of a machine learning model’s lifecycle. Many models experience drift, degrading in performance over time. Deployed models need to be monitored and refit. Each deployed model should log all inputs, outputs, and exceptions. A model deployment platform needs to provide for log storage and model performance visualization.

On our machine learning platform, each model logs each execution to a common format. Each hosted model application emits logs in a common way. We route and store these logs, using them for monitoring model performance and to help identify drift. Finally, we automatically create model dashboards to provide additional insights on how each model is performing.

Keeping a close eye on model performance is key to effectively managing the lifecycle of a machine learning model. You can’t neglect model monitoring as part of a model’s overall lifecycle.

6. Standardize on a common development methodology

Software engineers have come up with great methodologies and design patterns that we can use to build portable and resilient applications. Many of these methodologies can easily be adapted to machine learning applications, if your model builders are aware of them. Leverage what’s out there.

The Machine Learning Platform’s unofficial motto — #noShelfWare

Getting Results

Incorporating these six rules helped us get faster results, and I hope they’ll help your organization as well. Data Science should be about creating software that has impact. White papers, dashboards, word clouds, and pie charts just aren’t going to cut it anymore, if they ever did. Getting results takes hard work.

It’s not an easy path. Steve Jobs said, “There is a tremendous amount of craftsmanship between a great idea and a great product.” To get to market fast you might need to put on your DevOps hat. This means your team may need to level up their software engineering skillset.

But, it’s worth it. What used to sometimes take 12 months now takes minutes at our organization. More importantly, we aren’t building analytic shelf-ware.