In machine learning, going from research to production environment requires a well designed architecture. This blog shows how to transfer a trained model to a prediction server.

By Amine Baatout, ContentSquare.

Machine Learning in Production

From trained models to prediction servers

In this article, we will discuss how to go from the research phase to the production phase for ML projects and what are the different options to do so.

2-in-1 approach

If you try to have your training and server code in the same repository you would probably end up with a big mess that is hard to maintain. Training models and serving real-time prediction are extremely different tasks and hence should be handled by separate components. Last but not least, there is a proverb that says “Don’t s**t where you eat”, so there’s that too.

Thus, a better approach would be to separate the training from the server. This way, you can do all the data science stuff on your local machine, and once you have your awesome model, you can transfer it to the server to make live predictions.

Model as a config

Our reference example will be a logistic regression on the classic Pima Indians Diabetes Dataset which has 8 numeric features and a binary label. The following Python code gives us a training set and a test set.

Python Code: train_test_split.py

Model coefficients transfer approach

After we split the data we can train our LogReg and save its coefficients in a json file.

Once we have our coefficients in a safe place, we can reproduce our model in any language or framework we like. Concretely we can write these coefficients in the server configuration files. This way, when the server starts, it will initialize the logreg model with the proper weights from the config.

Python Code: save_model_coefficients.py

The big advantage here is that the training and the server parts are totally independent regarding the programming language and the library requirements.

However, one issue that is often neglected is the feature engineering — or more accurately: the dark side of machine learning. In general you rarely train a model directly on raw data, there is always some preprocessing that should be done beforehand. It could be anything from standardisation or PCA to all sorts of exotic transformations.

So if you choose to code the preprocessing part in the server side too, note that every little change you make in the training should be duplicated in the server — meaning a new release for both sides. So if you’re always trying to improve the score by tweaking the feature engineering part, be prepared for the double load of work and plenty of redundancy. (cf figure 2)

Figure 2. The 387301st release of a prediction server (yes, I’m exaggerating) due to a simple change in the feature engineering which doesn’t impact how the server works. Not good.

PMML approach

Another solution is to use PMML which provides a way to describe predictive models along with data transformation. However, it supports few ML models and lacks the support of many custom transformations. But then again if you choose to stick with the standard models and transformations, PMML would be the option to go for.

Custom DSL/Framework approach

If you want more than PMML could offer, you could build your own DSL or framework that lets you translate what you did in the training side to the server side. However this can be a time consuming task that not everyone can afford.

Model as black box

Now, I want to bring your attention to one thing in common between the previously discussed methods: They all treat the predictive model as a “configuration”. Instead we could consider it as a “standalone program” or a black box that has everything it needs to run and that is easily transferable. (cf figure 3)

Figure 3. Top: Model description transfer approach. The server loads the config and uses it to create the model. Bottom: Black box transfer approach. The server loads the standalone model itself.

The black box approach

In order to transfer your trained model along with its preprocessing steps as an encapsulated entity to your server, you will need serialization. You should be able to put anything you want in this black box and end up with an object that accepts raw input and outputs the prediction. (cf figure 4)

Figure 4. Standalone trained model ready to be integrated transparently in the server side.

Let’s try to build this black box using Pipeline from Scikit-learn and Dill library for serialisation. We will use a custom transformation `is_adult` that wouldn’t be supported by PMML.

Python Code: pickle_export.py

Ok now let’s load it in the server side.

To better simulate the server environment, try running the pipeline somewhere the training modules are not accessible. Make sure that whatever libraries you used to build the model, you must have them installed in your server environment as well.

Python Code: run_pickle.py

In practice, custom transformations can be a lot more complex than our example, but the idea is the same.

Ok, so the main challenge in this approach, is that pickling is often tricky. That is why I want to share with you some good practices that I learned from my few experiences:

Avoid using imports from other python scripts as much as possible (imports from libraries are ok of course). Example: Say that in the previous example is_adult is imported from a different file: `from other_script import is_adult`. This won’t be serialisable by any serialisation lib like Pickle, Dill, Joblib or Cloudpickle because they do not serialise imports by default. The solution is to have everything used by the pipeline in the same script that creates the pipeline. However if you have a strong reason against putting everything in the same file, you could always replace the `import other_script` by `execfile(“other_script”)` to make it work. Avoid using lambdas because generally they are not easy to serialize. While Dill is able to serialize lambdas, the standard Pickle lib cannot. You could say that you can use Dill then. This is true, but beware! Some components in Scikit-learn use the standard Pickle for parallelisation like GridSearchCV. So what you want to parallelise should be not only “dillable” but also “picklable”. Here is an example of how to avoid using lambdas: Say that instead of `is_adult` you have `def is_bigger_than(x, threshold): return x > threshold`. In the DatafameMapper you want to apply x -> is_bigger_than(x, 18) to the column “age”. So, instead of doing: `FunctionTransformer(lambda x: is_bigger_than(x, 18)))` you could write `FunctionTransformer(partial(is_bigger_than, threshold=18)`. Voilà ! When you are stuck don’t hesitate to try different pickling libraries, and remember, everything has a solution. However, when you are really stuck, ping-pong or foosball could really help.

Finally, with the black box approach, not only you can embark all the weird stuff that you do in feature engineering, but also you can put your own custom ML model!

The demo

For the demo I will try to write a clean version of the above scripts. We will use Sklearn and Pandas for the training part and Flask for the server part. We will also use a parallelised GridSearchCV for our pipeline.

Note that in real life it’s more complicated than this demo code, since you will probably need an orchestration mechanism to handle model releases and transfer.

Last but not least, if you have any comments or critics, please don’t hesitate to share them below. I would be very happy to discuss them with you.

Original. Reposted with permission.

Bio: Amine Baatout is a data scientist at ContentSquare, focusing on machine learning engineering, algorithms and software design.

Related: