The CARLA Simulator. Image courtesy of the CARLA Team.

Simulations can’t solve autonomous driving because they lack important knowledge about the real world

Large-scale real world data is the only way

Why use a simulation to train neural networks, rather than collect data from robots in the real world? Three reasons:

To avoid real world danger. To simulate hypothetical and counterfactual situations that can’t be contrived in the real world. To collect more data than would be feasible with real world robots.

All companies working on self-driving cars use simulation. Simulation is a useful tool for testing software, and in some cases for training neural networks. However, simulation can’t substitute for large-scale data collection in the real world.

The reason is this: a simulation doesn’t contain the same empirical knowledge that the real world does, and some of that empirical knowledge is necessary for driving. In particular, a simulation lacks empirical knowledge about the behaviour of other road users, namely vehicles, pedestrians, and cyclists. The problem can be split into two parts:

Behaviour prediction: Knowing what a road user will do before they do it.

Communication and interaction: Knowing how a road user will react to the self-driving car’s actions (such as signalling or nudging into a lane) and knowing how to react to their reactions to produce the desired outcome.

How humans behave is an empirical question that requires empirical data to answer. Simplistic models of human driving like you find in video games might be sufficient for some limited driving scenarios. However, as scenarios get denser, more urban, more crowded, messier, more complex, more interactive, and more anarchic, the difficulty of modelling human behaviour increases, and the predictive power of simplistic models decreases.

A self-driving car needs to accurately model empirical phenomena. For example, if a self-driving car begins to make an unprotected left turn into oncoming traffic, it needs to anticipate whether oncoming vehicles at various distances from the intersection will slow down, and if so by how much. This knowledge can’t be derived from armchair analysis. It is empirical.

Collecting real world data

Data for training behaviour prediction can be collected passively from the real world, with a set of cameras, a heavy-duty computer, and an occasional Internet connection. Observe, predict, flag errors, and upload. (This is what Tesla does with roughly 500,000 cars.)

Data for training communication and interaction can also be collected passively, as long as driver input (steering, acceleration, braking, and signalling actions taken by the human driver) is also recorded. This is called imitation learning. Human drivers demonstrate how to communicate and interact with other road users in the real world. The neural network learns to take the same actions when presented with the same situational variables.

In theory, this data can also be collected actively. For example, a self-driving car or a driver assistance system (like Tesla’s Autopilot) can take an action, and if that action is incorrect, the human occupant can take over. What the human does next could be treated as an important demonstration for imitation learning. Alternatively, under a reinforcement learning approach, the system could be rewarded for minimizing occasions where a human has to take over.

If an autonomous vehicles company trains neural networks to accurately model how human drivers behave, it can then simulate driving more accurately. Maybe then it can use simulation to train neural networks how to drive using reinforcement learning. But to get this point, large-scale collection of real world data is required. (Moreover, the more accurate a company’s model of human driving is, the closer the company gets to just being able to use that model to drive the car, instead of worrying about simulation.)

What about self-play?

A possible counterexample is OpenAI Five, which learned to play the complex video game Dota with no empirical knowledge of how humans play. OpenAI Five trained purely through reinforcement learning, playing against itself over innumerable iterations.

Here’s where Dota is disanalogous to driving: in Dota, the weaker the human agents, the easier the task. In driving, the weaker the human agents, the harder the task. What is the end-point of a driving agent trained purely through self-play? Does it make unprotected left turns with reckless abandon, knowing the other computer agents will avoid it with superhuman agility? Clearly that expectation would not transfer well to the real world.

Perhaps the driving agent learns to be exceedingly polite and cautious, knowing the other agents will be equally polite and cautious. Again, that expectation would not transfer well to reality. (Also, trying to hand-tune the aggressiveness of the agent to find a happy medium would rest on armchair analysis, not empirical data.)

Conclusion

To know how to drive, a self-driving car needs to be able to predict what other road users will do, and it needs to be able to communicate and interact cooperatively with other road users, particularly with human drivers. Predicting what road users will do requires observing them empirically, and knowing how road users communicate and successfully interact requires empirically observing how humans do those things. That’s why the knowledge about how to drive is not contained in any simulation – unless that simulation is running neural networks trained on large datasets of such empirical observations.

To learn how to drive using imitation learning and reinforcement learning, a self-driving car may need to draw upon tens of thousands of years of continuous driving. As with deep learning and reinforcement learning generally, the amount of empirical observation and experience needed may be massive. Almost all autonomous vehicle companies are trying to learn with a fleet of just a few hundred cars. This may not be enough to get the empirical data required. Companies should work on putting sensors, computing hardware, Internet connectivity, and driver assistance software into the millions of cars that are produced every year. This is not guaranteed to work, but it is practically guaranteed to work better than relying on a few hundred vehicles to get the data you need.