Element AI makes its BAyesian Active Learning library open source

Element AI’s BAyesian Active Learning library (BaaL library) is now open source and available on GitHub. In this article, we briefly describe active learning, its potential use with deep networks and the specific capabilities of our BaaL library.

What is Active Learning?

Machine learning applications generally require a huge amount of data, and in many cases, this data cannot be easily acquired. What’s more, even when data is readily available, it often is not possible to label it efficiently. Active learning aims at reducing the amount of labelled data needed to train machine learning models.

How it works is actually pretty simple: we allow the model to actively “query” examples to be labelled. This way, we’re able to label only the most effective samples for training the model, rather than labelling a random selection.

Active Learning for Deep Learning

Common active learning techniques such as Gaussian processes or kernel methods fall short when used on high-dimensional data. Recent advances in deep learning, however, propose training a deep neural network so that it is able to learn from a small amount of data and actively query the next sample on its own.

To select the next sample, the uncertainty of the model needs to be estimated. By identifying the data points about which the model is the least certain, we aim to label the points which are also the most useful for the model. Estimating uncertainty is a hard problem in deep learning. Most recent methods propose estimating the posterior distribution of the model, allowing for the selection of the most uncertain sample—which would have the highest predictive variance.

Using MCDropout (Gal et al. 2015) and BALD (Houlsby et al. 2011), we present our results on CIFAR100 using a VGG16. By using active learning, we get the same performance with half of the samples.

Next, we present the t-SNE representation of a custom version of CIFAR10 where we only keep the classes “airplane”, “cat” and “truck”. This allows us to compare active learning versus standard random labelling.



Difference in results using active learning (left) versus standard random labelling (right)

We can clearly see that active learning selects samples where the decision is less certain, thus making it more efficient compared to random selection.

BaaL at Element AI

While active learning is itself an active area of research, we found that a lack of good libraries was slowing down both our own research and the use of active learning in enterprise software. For this reason, we at Element AI created an opinionated library that tries to solve both of these issues.

Today, we’re making that library free and open to the community. Our BaaL library proposes a unified API that is easy to use by AI practitioners and researchers alike. Check it out and let us know what you think!



Authors: Parmida Atighehchian, Frédéric Branchaud-Charron, Jan Freyberg, and Lorne Schell

Sources

Active Learning with Statistical Models

Deep Bayesian Active Learning with Image Data

Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning (Gal et al. 2015)

Bayesian layers (Shridhar et al. 2019)

Deeper Connections between Neural Networks and Gaussian Processes Speed-up Active Learning (Panov et al. 2019)

A Simple Baseline for Bayesian Uncertainty in Deep Learning (Zellers et al. 2018)

Bayesian Active Learning for Classification and Preference ...