Turi Create + Google Colab

Just because both of these tech giants have spearheaded the way for developers to get started with building machine learning models, doesn’t mean that everything is a walk in the park. Getting Turi Create to train a model, with a GPU, on Colab was no small feat. In the rest of this post I will share how I was able to conquer the beast and train an Image Classification model on a Tesla T4 GPU. At the end, I link you directly to the code so you can play around for yourself.

The Challenges

Under the hood, Turi Create leverages both CUDA (Nvidia’s GPU API) and mxnet-cuXX (Apache’s deep learning framework) to train models on GPUs.

Colab comes with CUDA 10 pre-installed at time of this writing.

Turi Create recommends using CUDA 8 and mxnet-cu80==1.1.0 for its GPU workloads. Check out this resource from their GitHub.

You could try to uninstall CUDA 10, reinstall all the right drivers and toolkits for CUDA 8, and then see if you can get things running.

That’s an option for sure! Google gives you some pretty serious control over the instance you are on. Personally, I went down that path and spent the better part of 2–3 days researching the right debian packages and libraries I needed to replace.

Here’s a useful starting point for going down that route: https://medium.com/@nickzamosenchuk/training-the-model-for-ios-coreml-in-google-colab-60-times-faster-6b3d1669fc46

However, since he’d written that article, Google had already changed the runtime environment significantly. I was not able to get Turi Create to properly use the GPU with CUDA 8 and mxnet-cu80. Each time I got to the model training portion, my runtime would crash due to “exploding” RAM. This seemed strange because the data I was working with was fairly small, on the scale of ~500Mb. It was hard to diagnose because Google doesn’t expose how jobs are scheduled on their GPU nodes, nor how GPU memory is shared/allocated.

A Solution

Contrary to documentation, I was able to train a Turi Create model on a Colab Python 3 + GPU runtime by using the default CUDA 10:

# Check the CUDA version

!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c)

2005-2018 NVIDIA Corporation

Built on Sat_Aug_25_21:08:01_CDT_2018

Cuda compilation tools, release 10.0, V10.0.130

and installing the corresponding mxnet-cu100 library with pip:

!pip install turicreate==5.4

# The wrong version of MXNET will be installed

!pip uninstall -y mxnet

# Install CUDA10-compatible version of mxnet

!pip install mxnet-cu100==1.4.0.post0

When you begin the data preparation and model training portion, you will need to import the turicreate library and set the number of GPUs in the config:

import turicreate as tc # Tell turicreate to use ALL available GPUs

tc.config.set_num_gpus(-1)

Slightly surprising, yet also encouraging, this solution paved my way to FREE GPU-accelerated model training in the cloud. Here is a link to the image classification model training code, hooked right up to Colab. Additionally, here is a repo of other example models that you could train on Colab.

Skafos.ai for Delivery

You now have a CoreML artifact trained with Turi Create in Colab. If integrated properly, this image classification model can run on an iOS device. Skafos.ai is an excellent solution for managing model versions and delivering updates over-the-air without re-submitting apps to the app store — check it out here.