Because mobile devices’ power and computation resources are limited, model compression is essential for the efficient deployment of neural network models on such devices. In the past, model compression was done by engineers who applied hand-crafted heuristics and rule-based policies. Model size, speed, and accuracy might be sacrificed due to the time-consuming nature of the process. Moreover, the results were often less than ideal.

Researchers from MIT, Google, and Xian Jiaotong University recently published a paper proposing AutoML for Model Compression (AMC), which leverages reinforcement learning to shorten model compression processing time and improve results. The method requires no human labour, has a higher compression ratio and improves accuracy compared to conventional rule-based compression methods.

Previous studies have proposed a variety of rule-based model compression heuristics. However, since the layers in deep neural networks are interdependent, rule-based pruning strategies cannot be adopted from one model to another. Also, neural network structures can change rapidly.

Automl: Rules don’t rule, learning rules

AutoML for Model Compression (AMC) uses reinforcement learning to automatically sample a large design space and improve model compression quality. Figure.1 shows an overview of the AMC engine. In a compressed network, the AMC engine automates this process through a learning-based policy rather than using engineers and rule-based policies.

Researchers noticed that the accuracy of the compressed model is extremely sensitive to the sparsity of each layer. Since the model requires a fine-grained action space, researchers invented a continuous compression ratio control strategy with a DDPG agent to learn via trial and error. They designed the model to shrink and speedup over time while penalizing accuracy loss. Thus, it is no longer necessary to search over a discrete space.

The authors proposed different compression policy search protocols for different scenarios. Resource-constrained compression can be used for latency-critical AI applications such as mobile apps, self-driving cars, and advertisement rankings, so that the best accuracy can be achieved given the maximum amount of hardware resources, e.g. FLOPs, latency, and model size.

Accuracy-guaranteed compression meanwhile is designed for quality-sensitive AI applications such as Google Photos. In these scenarios the compression can complete the smallest models with no accuracy lost since latency is not a constraining issue.

Experiment Results: learning-based policy outperformed

The research team evaluated the AMC engine on three different neural networks (VGG, ResNet, and MobileNet) to demonstrate its broad and general applicability. The results showed that AMC performed better than hand-crafted heuristic policies. For example, researchers managed to elevate the compression ratio from 3.4 to 5 with no accuracy lost.

Also, the AMC Engine’s MobileNet FLOPS level was reduced by 2X, top accuracy reached 70.2 percent, and the Pareto curve was better than 0.75 MobileNet.

The new method has the potential to free researchers from tedious and time-consuming tasks and facilitate efficient deep neural network design on mobile devices. The paper AMC: AutoML for Model Compression and Acceleration on Mobile Devices is on ArXiv.