The deployment of robust and scalable machine learning solutions remains quite an artisan and complicated process, requiring a lot of human involvement and effort. As a result, new products and services take a long time to market or get abandoned in a prototype state, reducing interest within the industry. So, how can we facilitate the process of bringing a machine learning model into production?

Cortex is an open-source platform for deploying machine learning models as production web services. It leverages the powerful AWS ecosystem to deploy, monitor and scale framework-agnostic models as needed. Its key features can be summarized as follows:

Framework-agnostic: Cortex supports any piece of python code; TensorFlow, PyTorch, scikit-learn, XGBoost, are all backed by the library, like any other python script.

Cortex supports any piece of python code; TensorFlow, PyTorch, scikit-learn, XGBoost, are all backed by the library, like any other python script. Autoscaling: Cortex automatically scales in and out your APIs, to handle production workloads.

Cortex automatically scales in and out your APIs, to handle production workloads. CPU / GPU support: Cortex can run either on CPU or GPU environment using AWS IaaS as its substrate infrastructure.

Cortex can run either on CPU or GPU environment using AWS IaaS as its substrate infrastructure. Spot instances: Cortex supports EC2 spot instances to keep costs down.

Cortex supports EC2 spot instances to keep costs down. Rolling updates: Cortex applies any updates to the model without any downtime.

Cortex applies any updates to the model without any downtime. Log streaming: Cortex keeps the logs from deployed models and streams them to your CLI, using a familiar docker-like syntax.

Cortex keeps the logs from deployed models and streams them to your CLI, using a familiar docker-like syntax. Prediction monitoring: Cortex monitors network metrics and tracks predictions.

Cortex monitors network metrics and tracks predictions. Minimal configuration: Cortex deployment configurations are defined as a simple YAML file.

In this story, we use Cortex to deploy an image classification model as a web service on AWS. So, without further ado, let us introduce Cortex.

Deploy your Model as a Web Service

For this example, we use the fast.ai library and borrow the pets classification model from the first course of the associated MOOC. The following sections cover the installation of Cortex and the deployment of the pets classification model as a web service.

Installation

The first thing you should do, if you haven’t already, is create a new user account on AWS with programmatic access. For this, select the IAM service, then choose Users from the right panel and finally, press the Add User button. Give your user a name and select Programmatic access .

Next, in the Permissions screen select the Attach existing policies directly tab and choose AdministratorAccess .

You can leave the tags page blank, review and create your user. In the end, take note of the access key ID and secret access key.

As you are on AWS console, you may also create an S3 bucket, to store the trained model and any other artefacts your code may produce. You can name the bucket whatever you like, as long it is a unique name. For this story, we create a bucket named cortex-pets-model .

On the next step, we have to install Cortex CLI on our system and spin up a Kubernetes cluster. To install the Cortex CLI, run the command below:

bash -c “$(curl -sS https://raw.githubusercontent.com/cortexlabs/cortex/0.14/get-cli.sh)"

Check that you are installing the latest version of Cortex CLI by visiting the corresponding documentation section

We are now ready to set up our cluster. Creating a Kubernetes cluster using Cortex is trivial. Just execute the command below:

cortex cluster up

Cortex will ask you to provide some information, such as your AWS keys, the region you want to use, the compute instances you would like to launch and how many of them. Cortex will also let you know how much you will pay using the services you have chosen. The whole process might take up to 20 minutes.

Training your model

Cortex does not care how you create or train your model. For this example, we use the fast.ai library and the Oxford-IIIT Pet Dataset. This dataset holds 37 different breeds of dogs and cats. Thus, our model should classify each image into these 37 categories.

Create a trainer.py file like the one below.

Run the script locally, just like any other python script: python trainer.py . However, be sure to provide your AWS credentials and S3 bucket name. This script fetches the data, processes them, fits a pre-trained ResNet model and uploads it to S3 . You could, of course, expand this script to make the model more accurate using several techniques — a more complex architecture, discriminative learning rates, train for more epochs — but this is not related to our goal here. If you want to take the ResNet architecture a step further check out the following article.

Deploying your model

Now that we have our model trained and stored on S3 , the next step is to deploy it into production as a web service. To this end, we create a python script, called predictor.py , like below:

This file defines a predictor class. When it is instantiated, it retrieves the model from S3 , loads it in memory and defines a few necessary transformations and parameters. During inference, it reads an image from a given URL and returns the name of the predicted class. The interface of the predictor is exactly that. An __init__ method for initialization and a predict method, which receives the payload and returns a result.

The predictor script has two accompanying files. A requirements.txt file that records the library dependencies (e.g. pytorch, fastai, boto3 etc.) and a YAML configuration file. A minimal configuration is given below:

In this YAML file we define which script to run for inference, on which device (e.g. CPU) and where to find the trained model. More options are available in the documentation.

Finally, the structure of the project should follow the hierarchy below. Note that this is a bare minimum, though you can commit the trainer.py if you have a model ready for deployment.

- Project name

|----trainer.py

|----predictor.py

|----requirements.txt

|----cortex.yaml

Having all that in place you can simply run cortex deploy and within a few seconds, your new endpoint is ready to accept requests. Execute corted get pets-classifier to monitor the endpoint and view additional details.

status up-to-date requested last update avg request 2XX

live 1 1 13m - -

curl: curl endpoint: http://a984d095c6d3a11ea83cc0acfc96419b-1937254434.us-west-2.elb.amazonaws.com/pets-classifier curl: curl http://a984d095c6d3a11ea83cc0acfc96419b-1937254434.us-west-2.elb.amazonaws.com/pets-classifier?debug=true -X POST -H "Content-Type: application/json" -d @sample .json configuration

name: pets-classifier

endpoint: /pets-classifier

predictor:

type: python

path: predictor.py

config:

bucket: cortex-pets-model

device: cpu

key: model.pkl

compute:

cpu: 200m

autoscaling:

min_replicas: 1

max_replicas: 100

init_replicas: 1

workers_per_replica: 1

threads_per_worker: 1

target_replica_concurrency: 1.0

max_replica_concurrency: 1024

window: 1m0s

downscale_stabilization_period: 5m0s

upscale_stabilization_period: 0s

max_downscale_factor: 0.5

max_upscale_factor: 10.0

downscale_tolerance: 0.1

upscale_tolerance: 0.1

update_strategy:

max_surge: 25%

max_unavailable: 25%

What is left is to test it using curl and the image of a pomeranian:

Clean up

When we are done with the service and our cluster we should release the resources to avoid the extra cost. Cortex makes this easy:

cortex delete pets-classifier

cortex cluster down

Conclusion

In this story, we saw how can we use Cortex, an open-source platform for deploying machine learning models as production web services. We trained an image classifier, deploy it on AWS, monitor its performance and put it to the test.

For more advanced concepts, such as prediction monitoring, rolling updates, cluster configuration, auto-scaling and more visit the official documentation site, and the project’s GitHub page.