The Secret Behind the New AI Spring: Transfer Learning

Transfer learning has democratized artificial intelligence. A real-world example shows how.

As enterprises strive to find competitive advantages, artificial intelligence stands out as a "new" technology that can bring benefits to their organization. Model building is a big part of AI, but it is a time-consuming chore, so anything an enterprise can do to make faster progress is a plus. That includes finding ways to avoid reinventing the wheel when it comes to building AI models.

Transfer learning allows developers to take a model trained on one problem and retrain it on a new problem, reusing all of the prior work to enhance the precision of the new model without the need for the massive data or compute scale it takes to generate a new model from scratch. This makes the process of building complex models accessible to teams that otherwise lack these resources.

To understand why this need is so urgent, we need a holistic understanding of the costs related to building models.

ML Is Not New, Just Newly Accessible

The boom in machine learning's popularity may have led to the perception that ML is new. It's not -- businesses have been deriving value from ML-driven insights for 30 years. What is new is that there are now many more problems and opportunities that machine learning can address, and it's becoming clear that early adoption of ML can lead to profit advantages across all sectors.

The surge of interest in machine learning has expanded beyond academia. Machine learning has permeated all parts of the data-driven enterprise because two key technical innovations made it more accessible to broadening areas of business and personas: a drastic reduction in the cost of processing and the rise of transfer learning.

Processing at Scale Is Now Affordable

Innovations in GPUs and distributed compute resources have brought machine learning and AI into the realm of the affordable (from $109 per billion floating-point operations per second in 2003 to $0.03 today). In the past, access to a high-performance or supercomputing center was required to get started. Today, data scientists frequently perform initial work on their local computers, cloud nodes, or even commodity GPU hardware.

Once the hardware costs became manageable, the focus turned to the human labor costs of machine learning and artificial intelligence.

The Expensive and Onerous Labeling Problem

To understand why transfer learning was such a revolutionary development, you first need to understand how painful the previous processes were.

Among the most common machine learning use cases are classification problems: you have a set of data and a set of labels, and you want to apply labels to the data -- as in fraud analytics, spam detection, and object recognition. To train the model to recognize these connections in practice, you need a very large set of labeled data. Procuring large amounts of high-quality labeled data is very expensive and domain experts are required to provide useful labels.

To get a sense of what it costs to label a data set, think about trying to label tumors in CT scans. You'd need thousands of images for each type, and the person qualified to label these images is a specially trained radiologist. Assuming a radiologist charges $150/hour and can fully annotate 4 images an hour, and we need about 10,000 images, we're looking at one pass costing $375,000 (and realistically, you'd make two or three passes):

(10,000/4) * $150 = $375,000

The Solution: Transfer Learning

Transfer learning was a revolutionary breakthrough that made it possible to reuse prior work and democratize machine learning models. The first example of transfer learning occurred in 1998, but it has gained attention as a result of the yearly ImageNet Large Scale Visual Recognition Challenge (ILSVRC).

In transfer learning, developers take a base model that has been pre-trained to distinguish between common features using a large quantity of data. Retraining this model on your particular data set is a focused effort that requires a much smaller amount of data and compute resources, making the process accessible to teams that lack the massive resources required to train a precise model from scratch.

Diving In

Here's a real-world example to illustrate how transfer learning can cut time and costs building a model.

I wanted to build a model that could identify a likely dog breed based on a photograph. First, I'd have to build a model that could identify edges and objects, then dogs, and then features of the individual dog breeds.

The ImageNet community provides massive labeled data sets that can be used to train Convolutional Neural Net (CNN) classification models to classify over 10,000 classes of common objects such as "hamster," "schooner," and "strawberry." They used Mechanical Turk at a cost of ~$0.02-0.05 per labeled image, meaning a single pass of the initial batch (1.2 million images) cost around $25,000 in 2010. The full data set now contains 9 million images, meaning it would cost more than $250,000 to reproduce.

Labeling is only one cost incurred. Based on my own experiments, retraining the TensorFlow Inception V3 model on only 20,000 images took over 7 hours on a moderately sized, CPU-based cloud instance. Training this model on the full data set of 9 million images would be prohibitive for a hobbyist or moderately funded team.

Luckily, I don't have to do this process from scratch because the initial effort has already been crowdsourced for me. I was able to take the Inception model, which has been trained on the full ImageNet data set, and retrain this model using the Stanford Dogs Dataset, which contains 20,000 labeled images of dog breeds. This task took about 30 minutes to complete on a GPU-based cloud instance, and the storage required was less than 2 GB.

Transfer learning and the democratization that crowdsourcing the labeling and training provides made it possible for me to get up and running with a pretty sophisticated computer vision model in a few hours. The only costs I incurred were the minimal costs of running a few cloud instances. You can read more about my experiment here.

A Final Word

Transfer learning has democratized machine learning because it allows developers to reuse prior work, meaning that they don't have to reinvent the proverbial wheel in every situation. This lowers the barrier to entry, allowing more teams to explore and experiment, which will lead to innovation.