Facebook is leveraging its massive social media data to improve its machine learning algorithms, and the results look excellent. In the paper Exploring the Limits of Weakly Supervised Pretraining, the Facebook AI research team show how they trained a large convolutional network to predict hashtags on some 3.5 billion social media images. The research returned a state-of-the-art top-1 accuracy result of 85.4 percent on ImageNet. Facebook has open-sourced the model on the PyTorch Hub.

Pretraining models on the ImageNet dataset has been a mainstream research approach for years, but in today’s digital world where data is growing by orders of magnitude the 10 year-old ImageNet dataset is now considered relatively small in size. That motivated Facebook to explore how pretraining a machine learning model on a large-scale, weakly-supervised dataset could impact model performance.

The researchers used about 3.5 billion public Instagram images with 8,000 hashtags as an image classification dataset. Unlike hand-labeled images, social media images and their user-generated hashtags are noisy, and there are often duplicates. To deal with these issues researchers created a measure they called “Lower-Bound Accuracy,” which regards duplicate test images in the training set as incorrect images.

The Facebook experiments across these billions of images returned some staggering numbers:

To increase training practically Facebook researchers used synchronous stochastic gradient descent (SGD) on 336 GPUs across 42 machines with mini-batches of 8,064 images.

They used residual networks with convolutional layers (ResNeXt) as a training model with some 153 billion multiply-add FLOPs and 829 million parameters.

It took researchers about 22 days to train ResNeXt-101 32×16d (32 groups, 36 billions of flops, 193 million parameters) on the 3.5 billion images.

Researchers have released four ResNeXt models with different capacities

For the experiments researchers pretrained convolutional networks for hashtag prediction and then transferred those networks to a variety of tasks. They evaluated Instagram pretraining by measuring classification accuracies on three classification target tasks: ImageNet, CUB2011, and Places365.

Researchers found that using the massive Instagram data improved model performance on several image classification and object detection tasks compared with training from scratch. Trained on 940 million images and 1.5K hashtags, the ResNext-101 32x48d model achieves the highest ImageNet-1k single-crop, top-1 accuracy to date: 85.4% (97.6% top-5).

Researchers discovered a number of other interesting phenomena through their experiments. For example, simply increasing the size of the pretraining dataset doesn’t directly deliver better results. On the ImageNet-1k classification task, networks pretrained on 1.5k hashtags outperformed those trained with a larger dataset because the 1.5k hashtags were selected to match the target task.

Another discovery was that current network architectures tend to underfit when training convolutional networks on billions of training images. Pretraining ResNeXt-101 32x4d on more Instagram images with a larger number of hashtags resulted in an almost log-linear improvement in accuracy.

The paper Exploring the Limits of Weakly Supervised Pretraining is on arXiv. Related data has been open-sourced on PyTorch and GitHub.