How to Train your Self-Driving Car to Steer

A step-by-step guide using small and efficient neural networks and a bit of magic.

Neural networks, and particularly deep learning research, have obtained many breakthroughs recently in the field of computer vision and other important fields in computer science. Among many different application, one technology that is currently on the rising is self-driving cars. Everybody has heard of them, all the major company seem to invest heavily on this new-millenium gold rush. AI-powered cars that can take you anywhere while you spend your time, well, not driving. In this post I will show you how to train a neural network to steer autonomously using only images of the road ahead. You can find all the code, explained step-by-step, in this Jupyter Notebook. You can also find a more detailed paper here.

Deep neural networks, especially in the field of computer vision, object recognition and so on, have often a lot of parameters, millions of them. This means that they’re heavy both computationally that on the memory of the device on which they’re running. If you’re an academic laboratory or a big company and you have your data centers and tons of GPUs, that is not a problem. But if you only have an embedded system on a car that should drive in real-time, that could be a problem. That’s why I will focus on particular architectures that are very slim, fast and efficient. The main model that I used is the SqueezeNet architecture. It’s a quite recent model that achieved remarkable performances on object recognition tasks with very few parameters, and weighting just some megabytes. I suggest to read this story along with the code, that is already quite well detailed, to further understand the concepts.

The first thing that we need is a dataset, the core of most deep learning projects. Luckily, there are a couple of datasets that work for us. We mostly need images recorded from hours of human driving into different settings (highways, cities). You can find one in the notebook. After having the dataset, we need to preprocess the data to make our algorithm work better. For example, we certainly cannot load the entire dataset into the RAM, thus we need to design a generator, that is a particularly useful kind of function in Python that allows to dynamically load a small batch of data, preprocess it, and then output it directly into our neural network. To help the network generalize better to every possible weather and light condition, we can modify the brightness of our images randomly. Furthermore, we can crop the top slice of our images, since it contains mostly sky and other useless information for driving. This helps making the whole computation faster.

NVIDIA model.

After the preprocessing, we can start designing our networks. I used Keras for that, making it quite readable. The first model is the NVIDIA model, a quite classical CNN. After some convolutional layers, that extract visual features from our images, we have a flattening layers and then fully connected layers, that outputs a single real-valued number: our steering angle. You can see the details of the network in the code.

If you’re training this network on a laptop, especially without GPU acceleration, you could need a whole day to train it. (but it’s worth the effort. Probably.) After this relatively small training, you can see how the validation loss decreases remarkably, thus the network is really learning how to drive.

This architecture can work in real-time on a laptop, and has around 500.000 parameters. But we can do better and make an even smaller network. That’s where SqueezeNet comes along. That particular architecture is already slim, and I further shrinked it by lowering the number of convolutional features. The core of this architecture is the Fire module, a very ingenious block of filters that can extract semantically important features using very few parameters, and with a small output. You can see the details of the network implementation in the code. The final layers were modified as well, since our task is a regression in the image space, while the network was initially designed for object recognition.

The Fire Module.

Using the same training setting as before, we can see how the training is faster, and the network achieves even better performances after around 10 epochs.

You could argue that here we are predicting the steering angle based only on the current frame, while driving is a dynamical task, that depends on the previous frames too. That’s why the last model that I show here is a recurrent model. I added a recurrent layer to the output of one of the first densely connected layers of SqueezeNet: the network now takes as input 5 consecutive frames, and then the recurrent layers outputs a single real-valued number, the steering angle. Surprisingly, the performances of this new architecture, even if it resembles more closely the human way of deciding how to steer, are not better compared to the previously seen architectures. Memory-less and state-less architecture can thus drive quite well, computing the steering angle from a single frame, independently from others.

And now, finally, a small video of our network in action. The script was taken from this really cool repository. It shows real time driving of the car, and the steer is completely controlled by the network based on the street that it sees. Pretty good, right?

Our self-driving car in action.

We’ve trained our self-driving car to steer with quite simple architectures and techniques, obtaining remarkable results. I hope you’ve learned one trick or two from this post, the code and also the paper of the work. Feel free to comment or get in touch!