Evolution at Waymo

The first experiments that DeepMind and Waymo collaborated on involved training a network that generates boxes around pedestrians, bicyclists, and motorcyclists detected by our sensors–named a “region proposal network.” The aim was to investigate whether PBT could improve a neural net's ability to detect pedestrians along two measures: recall (the fraction of pedestrians identified by the neural net over total number of pedestrians in the scene) and precision (the fraction of detected pedestrians that are actually pedestrians, and not spurious “false positives”). Waymo’s vehicles detect these road users using multiple neural nets and other methods, but the goal of this experiment was to train this single neural net to maintain recall over 99%, while reducing false positives using population-based training.

We learned a lot from this experiment. Firstly, we discovered that we needed to create a realistic and robust evaluation for the networks so that we’d know if a neural net would truly perform better when deployed across a variety of situations in the real world. This evaluation formed the basis of the competition that PBT employs to pick one winning neural net over another. To ensure neural nets perform well generally, and don’t simply memorise answers to examples they've seen during training, our PBT competition evaluation uses a set of examples (the "validation set") that is different from those used in training (the "training set.") To verify final performance, we also use a third set of examples (the "evaluation set") that the neural nets have never seen in training or competition.

Secondly, we learned that we needed fast evaluation to support frequent evolutionary competition. Researchers seldom evaluate their models during training, and when they do, the evaluation is done infrequently. PBT required models be evaluated every 15 minutes. To achieve this, we took advantage of Google’s data centres to parallelise the evaluation across hundreds of distributed machines.

The power of diversity in evolutionary competition

During these experiments, we noticed that one of PBT’s strengths–allocating more resources to the progeny of better performing networks–can also be a weakness, because PBT optimises for the present and fails to consider long-term outcomes. This can be a problem because it disadvantages late-bloomers, so neural nets with hyperparameters that perform better over the long term don’t have the chance to mature and succeed. One way to combat this is to increase population diversity, which can be achieved by simply training a larger population. If the population is large enough, there is a greater chance for networks with late-blooming hyperparameters to survive and catch up in later generations.

In these experiments, we were able to increase diversity by creating sub-populations called “niches,” where neural nets were only allowed to compete within their own sub-groups–similar to how species evolve when isolated on islands. We also tried to directly reward diversity through a technique called “fitness sharing,” where we measure the difference between members of the population and give more unique neural nets an edge in the competition. Greater diversity allows PBT to explore a larger hyperparameter space.

Results

PBT enabled dramatic improvements in model performance. For the experiment above, our PBT models were able to achieve higher precision by reducing false positives by 24% compared to its hand-tuned equivalent, while maintaining a high recall rate. A chief advantage of evolutionary methods such as PBT is that they can optimise arbitrarily complex metrics. Traditionally, neural nets can only be trained using simple and smooth loss functions, which act as a proxy for what we really care about. PBT enabled us to go beyond the update rule used for training neural nets, and towards the more complex metrics optimising for features we care about, such as maximising precision under high recall rates.

PBT also saves time and resources. The hyperparameter schedule discovered with PBT-trained nets outperformed Waymo’s previous net with half the training time and resources. Overall, PBT uses half the computational resources used by random parallel search to efficiently discover better hyperparameter schedules. It also saves time for researchers–by incorporating PBT directly into Waymo’s technical infrastructure, researchers from across the company can apply this method with the click of a button, and spend less time tuning their learning rates. Since the completion of these experiments, PBT has been applied to many different Waymo models, and holds a lot of promise for helping to create more capable vehicles for the road.

Contributors: The work described here was a research collaboration between Yu-hsin Chen and Matthieu Devin of Waymo, and Ali Razavi, Ang Li, Sibon Li, Ola Spyra, Pramod Gupta and Oriol Vinyals of DeepMind. Advisors to the project include Max Jaderberg, Valentin Dalibard, Meire Fortunato and Jackson Broshear from DeepMind.