Learning to Simulate

How learning to simulate better synthetic data can improve deep learning

The paper presented at ICLR 2019 can be found here. I also have slides as well as a poster explaining the work in detail.

Photo by David Clode on Unsplash

Deep neural networks are an amazing piece of technology. With enough labelled data they can learn to produce very accurate classifiers for high dimensional inputs such as images and sound. In recent years the machine learning community has been able to successfully tackle problems such as classifying objects, detecting objects in images and segmenting images.

The caveat in the above statement is with enough labelled data. Simulations of real phenomena and of the real world can sometimes help. There are cases where synthetic data has improved performance in deep learning systems in computer vision or robotic control applications.

Simulation can give us accurate scenes with free labels. But let’s take Grand Theft Auto V (GTA) for example. Researchers have leveraged a dataset collected by free-roaming the GTA V world and have been using this dataset to bootstrap deep learning systems among other things. Many game designers and map creators have worked on creating the intricate world of GTA V. They painstakingly designed it, street by street, and then fine-combed the streets adding pedestrians, cars, objects, etc.

An example image from GTA V (Grand Theft Auto V)

This is expensive. Both in time and in money. Using random simulated scenes we might not do much better. This means important edge cases might be severely undersampled and our classifier might not learn how to detect them correctly. Let us imagine we are trying to train a classifier which detects dangerous scenes. In the real world we will run into dangerous scenes like the one below with very low frequency, yet they are very important. If we generate a large number of random scenes, we will have very few dangerous scenes like the one below as well. A dataset which undersamples these important cases might yield a classifier which fails on them.

Example of a dangerous traffic scene. These important cases can be undersampled when randomly sampling synthetic data. Can we do better?

Learning to simulate is the idea that we can potentially learn how to optimally generate scenes such that a deep network can either learn a very good representation or can perform well in a downstream task.

To test our work, we create a parameterized procedural traffic scene simulator using Unreal Engine 4 and the Carla plugin. Our simulator creates a road of variable length with different types of intersections (X, T, or L). We can populate the road with buildings on the side and cars of 5 different types on the road. The amount of buildings and cars are controlled by tunable parameters, as well as the type of cars. We can also change the weather between 4 different weather types, which control for lighting and rain effects. The main idea will be to learn the optimal parameters which control these scene characteristics for different tasks (for example semantic segmentation, or object detection).

A demo of our procedural scene simulator. We vary the length of the road, the intersections, the amount of cars, the type of cars and the amount of houses. All of these are controlled by a set of parameters.

To get sensor data we place a car on the road of our generated scenes which can capture RGB images from the generated scene which automatically have semantic segmentation labels and depth annotations (for free!).

An inside view of the generated scenes from our simulator with a fixed set of parameters

However, the learning to simulate algorithm is more general than this. We don’t have to use it exclusively for traffic scenes, it can apply to any type of parameterized simulator. By this we mean that, for any simulator that takes in parameters as an input, we present a way to search for the best parameters such that the data generated is optimal for a deep network to learn the downstream task. Our work, to the best of our knowledge, is the first to do simulation optimization to maximize performance on a main task, as well as apply it to traffic scenes.

Moving on to the crux of our algorithm. A traditional machine learning setup is the following, where data is sampled from a distribution P(x,y) (x is the data and y is the label). Usually this happens by collecting data in the real world and manually labeling the samples. This dataset is fixed, and we use it to train our model.

Traditional machine learning setup

By using a simulator to train a main task network, we can generate data from a new distribution Q defined by the simulator. This dataset is not fixed and we can generate as much data as our computation and time constraints allow. Still, the data generated in this domain randomization setup is randomly sampled from Q. The data needed for obtaining a good model could be large and performance can be suboptimal. Can we do better?

We introduce learning to simulate which optimizes a metric of our choice on a main task — the pipeline is trained by defining a reward function R which is directly related to this metric (usually is identical to the metric itself). We sample data from a parameterized simulator Q(x,y|Θ), with which we train the main task model at every iteration of the algorithm. The reward R that we defined is then used to inform the update of the Policy which controls the parameter Θ. The reward R is obtained by testing the trained network on a validation set. In our case, we use vanilla policy gradient to optimize our policy.

Informally, we are trying to find the best parameter Θ which gives us the distribution Q(x,y|Θ) which maximizes accuracy (or whichever metric) for the main task.