While supervised learning has tremendously improved AI performance in image classification, a major drawback is its reliance on large-scale labeled datasets. This has prompted researchers to explore the potential of unsupervised learning and semi-supervised learning — techniques that forego data annotation but have their own drawback: diminished accuracy.

A new paper from Google’s UK-based research company DeepMind addresses this with a model based on Contrastive Predictive Coding (CPC) that outperforms the fully-supervised AlexNet model in Top-1 and Top-5 accuracy on ImageNet.

CPC was introduced by DeepMind in 2018. The unsupervised learning approach uses a powerful autoregressive model to extract representations of high-dimensional data to predict future samples. Researchers trained a model — ResNet in this paper — to make predictions from unlabeled data and then used a contrastive loss function to evaluate the quality of these predictions and build a high-quality unsupervised pre-trained feature representation. Originally initiated on speech tasks, DeepMind researchers have also demonstrated CPC’s efficacy on image, text, and in reinforcement learning.

A major contribution of this paper is an improved CPC architecture that enables the capture of more useful representations from unlabeled data. Specifically, researchers enlarged the size of an original 23-block ResNet 101 model to a 46-block ResNet 170 model and applied a layer normalization technique to improve training efficiency. They designed a challenging task to pretrain the model and added data augmentation to increase the difficulty of the task.

Researchers designed two methods to train a CPC model attached with a linear classifier in a semi-supervised manner: Train the CPC feature extractor on a unlabeled dataset to get a fixed parameter and then optimize an attached classifier using a small amount of labeled data; or train both the extractor and classifier on top of the unlabeled dataset and then fine-tune the parameters of the entire network.

Experiment results showed that a linear classifier trained on CPC-extracted features from the ILSVRC ImageNet competition dataset images obtained 61.0 percent Top-1 and 83.0 percent Top-5 accuracies, outperforming the AlexNet score of 59.3 percent and 81.8 percent respectively.

Given 13 labeled images per class, DeepMind’s CPC model outperformed state-of-the-art semi-supervised methods by 10 percent in Top-5 accuracy, and supervised methods by 20 percent.

Researchers also suggested that the CPC model’s unsupervised representation can transfer well to other downstream tasks. Experiment results showed that the best-performing CPC model attached to a Faster-RCNN image detection network was only 2.6 percent short of the accuracy achieved by a fully-supervised ResNet.

While still at an early stage, DeepMind’s continuing research and development on unsupervised learning might one day enable the use of massive amounts of unlabeled data to build a machine-driven intelligent world of the future.

Read the paper Data-Efficient Image Recognition with Contrastive Predictive Coding on arXiv.