Photo by Stefan Lehner

Why Tesla’s Fleet Miles Matter for Autonomous Driving

Automatic labelling and automatic flagging of interesting training examples obviate the need for human labour and make data valuable

Lately I’ve been watching talks from the International Conference on Machine Learning (ICML) on this topic of autonomous driving. I’m going to jot down some thoughts that these talks have stimulated.

Tesla has approximately 650,000 HW2/3 cars on the road. These vehicles are driving approximately 1,000 miles per month each or 650 million miles as a fleet. Waymo, by comparison, drives about 1 million miles per month. So, Tesla’s vehicles are driving 650x more.

Does this matter? It doesn’t if Tesla needs a human to label every frame of video for these miles to be useful. But, in fact, Tesla can make use of these miles with no human labelling.

The ICML talk from Zoox explained how with prediction — predicting the trajectories of cars and pedestrians — you have “free ground truth”. Your perception system tracks the trajectories of objects around the vehicle, so you have a source of real trajectories that you can use to train prediction. At Tesla Autonomy Day, Andrej Karpathy referred to this as “automatic labelling”. I believe this would fall under the umbrella of self-supervised learning, a form of deep learning in which a neural network tries to use a portion of the data to predict the rest of the data (e.g. show it half of an image and it generates the other half; show it the first 5 seconds of a 10-second recording of a driving scene and it predicts the next 5 seconds). Rather than a human providing the supervisory signal by labelling a prediction as right or wrong, the data itself provides the supervisory signal by showing the neural network whether its prediction is right or wrong.

Prediction is an important area of autonomous driving where Tesla can use data from those 650 million miles per month for self-supervised learning with no requirement for human labour. Moreover, Tesla can flag and upload only the instances where Tesla’s predictor fails (as shown by its perception system in the seconds following the prediction). In theory, this should allow Tesla to significantly improve on state-of-the-art academic results in prediction.

Encouragingly, Yann LeCun believes prediction is a tractable problem:

The ICML talk from Aurora pointed out how human driving is a valuable source of learning when it comes to planning and decision making. Aurora particularly emphasized the importance of human interventions for imitation learning. Additionally, the Aurora speaker talked about flagging interesting human demonstrations without interventions. When a human driver takes a trajectory, Aurora’s software can determine how likely it is that this trajectory would be produced by Aurora’s planner. If the probability is low, this suggests a disagreement between the human driver and the planner. Aurora discussed this in the context of recorded data stored on Aurora’s servers (“offline data”), but I don’t see why you couldn’t run this live (“online”) in a car as well.

Human interventions for Autopilot, Summon, and Full Self-Driving are one way to flag or “mine” useful data. Techniques to detect disagreement between Tesla’s planner and human driving behaviour are another potential way. With every software update, the planner should learn to do the correct thing in more situations. When it does the correct thing, human interventions won’t occur and there will be no more disagreement between human driving and the planner. So, with each subsequent update, Tesla will winnow down the failure cases. The remaining failure cases will, if an intervention or disagreement occurs, trigger an upload.

Prediction and planning both involve a largely automated cycle of drawing error examples from the fleet, using those examples to train neural networks, deploying new software, and then uploading more error examples. In theory, this process can scale to billions of examples.

The Aurora speaker and another ICML speaker, Sergey Levine, talked about hybrid planners that combine neural networks and hand-engineered systems. Mobileye’s RSS and Nvidia’s SFF take the same approach. This alleviates the concern that neural networks are wonky and might output silly, dangerous behaviours. Hybrid planners can enforce a set of common sense rules like “don’t hit other vehicles”, “don’t drive on the sidewalk”, and “don’t cross a double yellow line”. Sergey Levine talked about a hybrid planner wherein the vehicle will fall back on an explicit, hand-programmed planner whenever the neural network doesn’t have enough training data appropriate to the situation to make a confident decision.

To me, the strongest sign that imitation learning is a viable approach to planning is the fact that DeepMind used it to master StarCraft:

A Diamond StarCraft player is better than 70% of ranked StarCraft players.

Computer vision (a subset of perception) is the most labour intensive area of autonomous driving. The core of any training data set is going to be a large amount of images and video with high-quality labels carefully applied by humans. Driving 650 million miles per month won’t help Tesla get more labelled examples of cars than if it drove 65 million miles per month because cars are ubiquitous and the limit Tesla will run into first is the cost of labelling. But scale can still help in two ways.

First, not all important objects are ubiquitous. Bears and moose are extremely rare. Cows and horses on the road are also rare (but less so). Suppose Tesla has a neural network that detects 5% of the bears that Teslas encounter. Also, for every bear it detects, it falsely identifies ten objects as bears. This would be a terrible bear detector for safety purposes. But for building up a data set of bear images it could be supremely useful. 5% of all the bears that 650,000 Teslas encounter could be a lot of bears. Humans can easily discard the false positives (the non-bear images). These images can be used to re-train the bear detector. When the next version of the bear detector is deployed, it will be better at capturing true positives and avoiding false positives, so it will be better at building the data set of bear images. In theory, this virtuous cycle could propel Tesla to large data sets of rare objects.

Second, emerging techniques may be able to use large quantities of noisily labelled data to supplement the core data set of well-labelled images. “Weakly supervised learning” means learning from low-quality labels. For example, suppose Tesla wanted to train a neural network to identify drivable space (as opposed to space occupied by obstacles like cars, pedestrians, or guardrails or unsafe areas like water). A weakly supervised approach would be to automatically label any areas where human-steered Teslas drive (without a collision, which can be identified by abrupt deceleration) as drivable space. Perhaps supplementing manually labelled examples of drivable space with, say, 1,000x as many automatically labelled examples will yield better accuracy than the manually labelled examples alone.

A hacked Tesla computes drivable space, shown in green.

Facebook’s research lab has demonstrated that weakly supervised learning works well for image recognition. Facebook used Instagram hashtags, which often tenuously correspond to an image’s contents, to predict image labels assigned by paid human labellers. With 1 billion Instagram photos with hashtags, Facebook obtained better results than with 1 million well-labelled images.

Automatic labels in prediction, planning, and computer vision and automatic mining of rare objects means that Tesla’s performance on some tasks will improve (probably sub-linearly) with its fleet miles.

Cruise projects that, by the end of this year, its vehicles will be 5–11% as safe as the average human driver. Maybe with three orders of magnitude more fleet miles Tesla will be able to develop fully autonomous vehicles that are an order of magnitude safer than Cruise’s. This would be roughly consistent with the scaling trends observed in deep learning. If Cruise’s projection is accurate, then an order of magnitude improvement would put Tesla at 50–110% of average human safety.

If Waymo succeeds in its plan to remove safety drivers from its vehicles, and if Waymo has safety metrics to justify this plan, then that will suggest that full autonomy is a tractable problem with current technology. Unlike Waymo, Tesla doesn’t have the sensor redundancy of lidar, but it does have the ability to compile larger and better training data sets for the core problems of prediction, planning, and computer vision. On balance, I think Tesla is better off. If Waymo succeeds, I suspect Tesla won’t be long to follow.