Machine learning is one of the fastest-growing and most exciting fields out there, and deep learning represents its true bleeding edge. Since my days back at the University I was always fascinated by this field, however, back in 2009 accessibility to deep learning software, hardware and data were not really in reach. Since then things has changed.

In this short tutorial, I will walk you through some fundamental steps in deep learning and demonstrate how you can start experimenting with it yourself. We will create an app that will recognize traffic lights using live camera data.

Before we go ahead I want to give an honest disclaimer. This tutorial sometimes skips concepts and cut sharp corners just to make it a bit less intimidating and provide fast access to real-time testing. In addition, we will only review deep learning in the context of image classification to make it a bit more straightforward.

You can also read this article in Chinese thanks to Kevin Li.

The Traffic Light Detection Application We Will Build Running On My iPhone 7

Introduction

I assume that the term Deep Learning is not new to you. You are probably aware of many usages of machine learning in your everyday products such as Facebook’s face detection, Apple’s Siri, Mobileye’s car collision detection and more. However, with the recent release of open-source deep learning frameworks, not only the giants are able to launch these type of products, but startups are now able to put up a fight as well. If it’s Clarifai that are providing your application with vision capabilities, or AIDoc that are challenging the medical radiology industry — It seems that the landscape is changing.

Deep Learning

What is deep learning? What are neural networks? How does it work? I can go on and on here. Instead, I find that the following video by Andrew Ng is a great introduction to deep learning. Although a bit long, Andrew explains it with great examples and insights.

If you don’t have the time to go over it, then the main is as follows: In the past, in order for our computer to perform various tasks, we developers had to hand-craft algorithms for each and every small problem. The power of machine learning is to provide learning capabilities using the same algorithm. It finds out by itself what is important about the problem and tries to solve it on its own. As you will see later, except for providing data to the algorithm, we barely change anything to make it learn a new concept: Traffic lights.

The Training Process

Training our deep learning neural network is the actual learning step of the algorithm. We provide a dataset of classified images to the algorithm and expect it to learn how to classify new images that were not a part of the dataset used to train it.

Ideally, we would like to provide all our available data, called the Training set, to the training algorithm. However, just like humans, the algorithm needs to have feedback during the training process to see if it is doing well or not. To make this happen, we need to feed the algorithm with a separate validation set to provide feedback to the learning— We call this dataset the Validation set.

After completing the training, we would like to estimate how well are we performing with input data that was not used during the initial training. You guessed it right — we will need a third dataset called the Test set that will help us figure out what is our accuracy.

To sum up we will need 3 different datasets: Training, Validation, and Test.

Transfer Learning

In practice, we don’t usually train our deep learning networks from scratch. This is because it is relatively rare to have a dataset of sufficient size that is required for complex tasks such as image classification.

Detecting if an image contains a face with high accuracy requires a dataset of millions of samples

Instead, it is common to pre-train a network on a very large dataset and then use it as an initialization. In our case, we will use a pre-trained network that was trained with a dataset of over 1 million images and then use it to learn a different classification problem for which we only have ~20,000 images.

This trick actually works, as part of the learning task is shared between classification problems such as detecting edges, color or even different shapes.

The Mobile Opportunity

If you paid enough attention by now, you understand that data is the major player in the deep learning game. Without enough quality data, our algorithms will not be able to generalize the given problem and produce bad results in the real world.

Here lies a huge opportunity for mobile developers. With over 2 billion (!) devices worldwide that constantly capture data of different kinds, it is possible to build applications that will collect high-quality data, labeled or not, that can be used for training and learning. Many very successful startups were built just around this idea

Deep Learning With Caffe

Caffe

Caffe is a deep learning framework developed by the Berkeley Vision and Learning Center (BVLC) and by community contributors. It is widely used by Computer Vision researchers around the world. Although it was mainly built for Computer Vision, it can also be used for many other deep learning tasks.

DIGITS

NVIDIA’s DIGITS simplifies common deep learning tasks with Caffe using an intuitive web interface such as:

Managing datasets

Designing and training neural networks on multi-GPU systems

Monitoring performance in real time with advanced visualizations

DIGITS is completely interactive so developers can focus on designing and training networks rather than programming or debug.

Required Hardware

Training a deep neural network is usually a compute-intense operation. Although you can do it on a standard CPU-only machine, realistically you will need a powerful machine with a solid GPU such as NVIDIA’s GTX series (approx. $500-$700).

If you don’t have the resources to acquire such a machine, you can always rent one from Amazon. You can use the g2.2xlarge instance and with Amazon Spot Instances you can get it for ~$0.14 per running hour. Contact me if want to get more help with that.