Researchers from Beijing-based AI unicorn SenseTime and Nanyang Technological University have trained ImageNet/AlexNet in a record-breaking 1.5 minutes, a significant 2.6 times speedup over the previous record of 4 minutes. Since 2010, the ImageNet Large-Scale Visual Recognition Challenge has been an essential benchmark for examining and testing image recognition algorithms.

The AlexNet and ResNet-50 networks along with the ImageNet dataset are well developed and useful for image recognition tasks. Although various image recognition studies and methods have achieved high accuracy on ImageNet, the training process remains time-consuming. For example, a single NVIDIA M40 GPU requires 14 days to complete 90-epoch ResNet-50 training. Shortening these training times has emerged as a popular and practical challenge for AI researchers.

The SenseTime and Nanyang team used a communication backend called “GradientFlow” and a set of network optimization techniques to reduce the deep neural network (DNN) model training time. The researchers also proposed “lazy allreduce” to fuse multiple communication operations into a single one. They discovered that popular open-source DNN systems could only achieve a 2.5 speedup rate on 64 GPUs connected by 56 Gbps.

Researchers used 512 Volta GPUs for ImageNet/AlexNet training and achieved 58.2 percent accuracy in 1.5 minutes, with a corresponding training throughput of 1514.3k images/s and a 410.2 speedup ratio. The previous record was held by a Tencent Machine Learning (腾讯机智, Jizhi) team, which used 1024 GPUs to train AlexNet on the ImageNet dataset in 4 minutes.

SenseTime’s innovations shortened training time and illustrated the potential of novel network optimization techniques for improving system performance. Further reductions in ImageNet/AlexNet training times are likely, due to factors such as continuing increases in GPU performance and decreasing network costs.

The paper Optimizing Network Performance for Distributed DNN Training on GPU Clusters: ImageNet/AlexNet Training in 1.5 Minutes is on arXiv.