For mobile applications using Core ML, one of the main burdens is the model size. A heavy app can discourage some users from downloading it. As such, developers often end up storing models in the cloud, costing both time and money.

Introducing Quantization

Quantization is one of the new features in the updated coremltools, and it can help solve this size problem by reducing — sometimes drastically — the size of Core ML models.

It works by trimming the number of bits used to describe weights in models. As some models can have millions of weights, shaving a few bits on each one can have a tremendous impact on overall size.

In iOS 11, models only used 32-bit floats to describe weights. This was later improved in 11.2 with the introduction of half-precision floats, using 16-bit for the same accuracy.