In this post, we will see how to train an autonomous racing car in minutes and how to smooth its control. The method, based on Reinforcement Learning (RL) and presented here in simulation (Donkey Car simulator), was designed to be applicable in the real world. It builds on the work of a startup named Wayve.ai that focuses on autonomous driving.

The code and simulator used in this article are open source and public. Please check the associated GitHub repository for more information ;) (pre-trained controllers are also available for download)

Video

GitHub Repository: Reproduce the Results

Introduction: Racing Car Competition

Since DIY Robocars creation few years ago, numerous autonomous racing car competitions exist now (e.g. Toulouse Robot Race, Iron Car, …). In those, the goal is simple: you have a racing car and it must go as fast as possible while staying on a track, given only an image from its on-board camera as input.

The Warehouse level, inspired by DIY Robocars

Self-driving challenges are a good way to get into robotics. To facilitate learning, the Donkey Car, an open source self-driving platform was developed. In its ecosystem, there is now a unity simulator featuring that small robot. We will be testing the proposed approach on this Donkey Car.

Outline

After briefly reviewing the different methods used in small autonomous car competitions, we will present what is reinforcement learning and then go into the details of our approach.

Methods Used in the Self-Driving Competitions: Line Following and Behavior Cloning

Before presenting RL, we will first quickly review what are the different solutions currently used in the RC car competitions.

In a previous blog post, I’ve described a first approach to drive autonomously, that combines computer vision and a PID controller. Although the idea is simple and applicable to many settings, it requires manual labeling of data (to tell the car where the center of the track is) which is costly and exhausting (trust me, manual labeling is not fun!).

As an other approach, lots of competitors use supervised learning to reproduce a human driver behavior. For that, a human needs to drive manually the car during several laps, recording camera image and associated control input from the joystick. Then, a model is trained to reproduce the human driving. However, this technique is not really robust, requires homogeneous driving and retraining for each track, because it generalizes quite badly.

What Is Reinforcement Learning (RL) and Why Should We Use It?

In view of the above issues, reinforcement learning (RL) appears to be an interesting alternative.

In a reinforcement learning setting, an agent (or robot) acts on its environment and receives a reward as feedback. It can be a positive reward (the robot did something good) or negative reward (the robot should be penalized).

The goal of the robot is to maximize the cumulative reward. To do so, it learns, through interaction with the world, what is called a policy (or behavior/controller) that maps its sensory input to actions.

In our case, the input is the camera image and the actions are the throttle and steering angle. So if we model the reward in a way that the car stays on the track and maximizes its velocity, we’re done!

That is the beauty of reinforcement learning, you need very little assumption (here only designing a reward function) and it will optimize directly what you want (go fast on the track to win the race!).

Note: This is not the first blog post about reinforcement learning on a small self-driving car, but compared to previous approaches, the presented technique takes only minutes (and not hours) to learn a good and smooth control policy (~5 to 10 minutes for a smooth controller, ~20 minutes for a very smooth one).

Now that we have briefly presented what is RL, we will go into the details, starting by dissecting the Wayve.ai approach, the base of our method.

Learning to Drive in a Day — Key Elements of Wayve.ai Approach

Wayve.ai describes a method to train a self-driving car in the real world on a simple road. This method is composed of several key elements.

Wayve.ai approach: learning to drive in a day

First, they train a feature extractor (here a Variational Auto-Encoder or VAE) to compress the image to a lower dimensional space. The model is trained to reconstruct the input image but contains a bottleneck that forces it to compress the information.

This step of extracting relevant information from raw data is called State Representation Learning (SRL), and was my main research topic. That notably allows to reduce the search space and therefore accelerate training. Below is a diagram that shows the connection between SRL and end-to-end reinforcement learning, that is to say, learn directly the control policy from pixels.

Note: training an auto-encoder is not the only solution to extract useful features, you can also train for instance an inverse dynamics model.

The second key element is the use of an RL algorithm named Deep Deterministic Policy Gradient (DDPG), which learns a control policy using the VAE features as input. This policy is updated after each episode. One important aspect of the algorithm is that it has a memory, called replay buffer, where its interactions with its environment are recorded and can be “replayed” afterward. So, even when the car does not interact with the world, it can sample experience from this buffer to update its policy.

The car is trained to maximize the number of meters traveled before human intervention. And that is the final key ingredient: the human operator ends the episode as soon as the car starts going off the road. This early termination is really important (as shown by Deep Mimic) and prevents the car from exploring regions that are not interesting to solve the task.