Training a deep learning model to steer a car in 99 lines of code

The magical power of deep learning in 2017.

Deep learning in 2017 is magical. We get to apply immensely complex algorithms to equally complex problems without having to spend all our time writing the algorithms ourselves. Instead, thanks to libraries like TensorFlow and Keras, we get to focus on the fun stuff: model architecture, parameter tuning and data augmentation.

Today, we’ll explore one such application of deep learning. We’ll use the Udacity self-driving car nanodegree program simulator to train a generalized steering model in under 100 lines of code. (As a point of reference, WordPress’ get_permalink() method, which retrieves the URL of a WordPress post, is 124 lines!)

Note: Quite a few people have written about their solutions to this Udacity simulator steering problem on Medium. I highly recommend Vivek Yadav’s post, An augmentation based deep neural network approach to learn human driving behavior, which presents a wealth of information on the problem and the power of data augmentation.

Without further ado, here’s our entire solution: 99 lines including comments and white space:

Note: Shoutout to Whoopska on Reddit for the code review!

Setup

The Udacity simulator has two modes, training and autonomous, and two tracks. The challenge is to train only on track 1 and come up with a model that can drive itself around track 2, which has significantly different features.

To do this, we first drive the car around in training mode on track 1 for five laps to generate our data, doing our best to keep the car in the center. From there, the above code takes over. Let’s go through each part:

The Model

The model is a ConvNet that draws inspiration from a slew of sources including NVIDIA, VGG16 and Mr. Yadav’s post, linked above. We stack five convolution layers, which grow in width each layer, with max pooling between them for spatial reduction (and memory friendliness). Then we top it off with a couple dense layers, and use a linear activation to output our continuous steering angle. We use “elu” activations across the board the adam optimizer, both of which were found to help the model converge quickly.

Processing the log

The Udacity simulator spits out the data it records into a CSV. It stores the path to three images for each frame: left image, center image and right image. This is where the NVIDIA influence really comes in strong:

We actually throw out the center images entirely because the left and right images hold so much information. For each sample, we add 0.4 steering angle to the left image, and -0.4 angle for the right image. The intuition here is that if our center image (as we’d see when actually driving with the model) looks like the right image when we’re going straight, then we probably need to veer to the left a little, and vice versa.

The 0.4 value is pretty aggressive and was found through trial and error. Others who have solved this problem report using 0.2 or 0.25. Although those values worked great for track 1, generalizing for track 2 required us to go big or go home, and 0.4 worked best.

Processing the images

We have to pass our images to our neural net as numpy arrays for training, so that is taken care of in the image processing function (and we normalize the values between -0.5 and 0.5).

We also do some data augmentation here to help generalize our model, which is critical for getting it to work well on track 2. For each image, we randomly shift it vertically between -20 and +20%. This vertical shift helps the model perform significantly better on both tracks. We also randomly apply a darkened area to each image. One of the challenges with only training on track 1 is that track 2 has a lot of harsh shadows, which really threw off the model’s ability to stay on course. Applying a random darkened box to each image solved this issue effectively.

Last, we double the number of training samples we have by randomly performing a horizontal flip on the images, with a corresponding inversion of the steering angle. This has the added effect of neutralizing steering bias from imbalanced training data.

Training & Generator

Since we’re dealing with image data, we can’t load it all up into memory at the same time, so we use Kera’s awesome fit_generator function. This function accepts a Python generator as an argument, which yields the data.

Our generator randomly chooses batch_size samples from our X/y pairs, passes the image through our processor, and returns the batch for training. We used a batch size of 64, which performed significantly better than smaller batch sizes (32) and much larger ones (256/512). More exploration should be done here to better understand why.

Steering Performance

The model was able to drive itself around track 1 flawlessly after just a single epoch of training on 20,224 samples, which takes all of 64 seconds on my GeForce 960m. It’s really quite amazing how quickly the model is able to perform on this track.

The model breezes through track 1. Note: this used 0.25 for the steering angles, rather than the 0.4 for the angles on track 2. I recorded this one before moving to the more aggressive angles.

Generalizing for track 2 took longer, going 8 epochs before circumnavigating the whole track. It took a while for the model to learn how to deal with shadows and other very dark areas, relative to the training data.

The model solves track 2. Before applying a random darkened area to the images, the model would fail any time it encountered a shadow. You can see it still gets pretty close to the edge at times, but finds its way back.

So there you have it, a generalized steering model in under 100 lines of code. Now, a few caveats and notes:

I’m not generally for attempting to solve problems under some arbitrary code limit. The purpose of this post is to show how current libraries allow us to focus on the application of machine learning, instead of getting caught in the weeds writing sophisticated algorithms.

Although it didn’t take much code, this project did take quite a lot of experimenting and research to find good network parameters and data augmentation techniques. But that’s the beauty of it: I was able to spend all my time there, rather than on the nitty gritty.

This is only possible because of libraries like TensorFlow, TFLearn and Keras. The biggest hat tip to those who spend their time developing and open sourcing these projects.

I’m not in the Udacity self-driving car program, but my Medium feed is flooded with posts by those who are, so I got inspired. :)

Have suggestions for how I could improve my solution? I’d love to hear it. Thanks for reading!