Machine learning models are often black boxes to end users. Without access to the underlying model architecture and parameters, they are nearly impossible to reconstruct with inputs and outputs alone. Hosting a model in the cloud effectively prevents access to these underlying structures. Without breaching the hosting servers, an attacker has no access to the model: they can’t look at the layers, get the trained weights, or even see the framework it’s running on.

For many companies, this is a feature, not a bug. Models contain a treasure trove of intellectual property. Large amounts of training data are required to build any machine learning model. The value of those training datasets is embedded in the resulting ML model. A company may spend large amounts of resources compiling a training set for a ML model, and it doesn’t want to just give that trained model away.

Apple’s machine learning framework for iOS, Core ML, creates new opportunities for product development. Application developers can use machine learning models to create great experiences for users without having to rely on laggy network requests.

They can build models on high-bandwidth data such as streaming video or audio that would be impractical to send to the cloud for inference. But when a Core ML model is running inside of an app, an attacker can potentially look inside the black box.

When a developer deploys a machine learning model to a mobile device, they lose control over how the model is accessed or used. In this post, we’ll look at how Core ML models are stored inside of apps and show how it’s possible to reconstruct the original model from compiled Core ML resources.

The original .mlmodel file

The .mlmodel file is a compact representation of a model that Apple uses for Core ML. There are many tools that can be used to generate an .mlmodel file. The coremltools Python package converts Keras and Caffe models.

There are also many other tools for converting different model formats to the Core ML format (such as TensorFlow, mxnet, etc.). The Core ML model .mlmodel file contains the entire model specification. Our friend Matthijs Holleman, has a great blog post describing the .mlmodel file format here. I highly recommend checking it out.

Example of code in coremltools converting Keras model to Core ML specification.

Gaining access to the original .mlmodel file is enough to have access to the internals of the black box—IP that companies and developers wish to protect. But it turns out, during the app build phases, the original .mlmodel file is compiled and a different format is packed into the app bundle.