Stanley’s realization led to what he calls the steppingstone principle — and, with it, a way of designing algorithms that more fully embraces the endlessly creative potential of biological evolution.

Evolutionary algorithms have been around for a long time. Traditionally, they’ve been used to solve specific problems. In each generation, the solutions that perform best on some metric — the ability to control a two-legged robot, say — are selected and produce offspring. While these algorithms have seen some successes, they can be more computationally intensive than other approaches such as “deep learning,” which has exploded in popularity in recent years.

The steppingstone principle goes beyond traditional evolutionary approaches. Instead of optimizing for a specific goal, it embraces creative exploration of all possible solutions. By doing so, it has paid off with groundbreaking results. Earlier this year, one system based on the steppingstone principle mastered two video games that had stumped popular machine learning methods. And in a paper published last week in Nature, DeepMind — the artificial intelligence company that pioneered the use of deep learning for problems such as the game of Go — reported success in combining deep learning with the evolution of a diverse population of solutions.

The steppingstone’s potential can be seen by analogy with biological evolution. In nature, the tree of life has no overarching goal, and features used for one function might find themselves enlisted for something completely different. Feathers, for example, likely evolved for insulation and only later became handy for flight.

Biological evolution is also the only system to produce human intelligence, which is the ultimate dream of many AI researchers. Because of biology’s track record, Stanley and others have come to believe that if we want algorithms that can navigate the physical and social world as easily as we can — or better! — we need to imitate nature’s tactics. Instead of hard-coding the rules of reasoning, or having computers learn to score highly on specific performance metrics, they argue, we must let a population of solutions blossom. Make them prioritize novelty or interestingness instead of the ability to walk or talk. They may discover an indirect path, a set of steppingstones, and wind up walking and talking better than if they’d sought those skills directly.

New, Interesting, Diverse

After Picbreeder, Stanley set out to demonstrate that neuroevolution could overcome the most obvious argument against it: “If I run an algorithm that’s creative to such an extent that I’m not sure what it will produce,” he said, “it’s very interesting from a research perspective, but it’s a harder sell commercially.”

He hoped to show that by simply following ideas in interesting directions, algorithms could not only produce a diversity of results, but solve problems. More audaciously, he aimed to show that completely ignoring an objective can get you there faster than pursuing it. He did this through an approach called novelty search.

The system started with a neural network, which is an arrangement of small computing elements called neurons connected in layers. The output of one layer of neurons gets passed to the next layer via connections that have various “weights.” In a simple example, input data such as an image might be fed into the neural network. As the information from the image gets passed from layer to layer, the network extracts increasingly abstract information about its contents. Eventually, a final layer calculates the highest-level information: a label for the image.

In neuroevolution, you start by assigning random values to the weights between layers. This randomness means the network won’t be very good at its job. But from this sorry state, you then create a set of random mutations — offspring neural networks with slightly different weights — and evaluate their abilities. You keep the best ones, produce more offspring, and repeat. (More advanced neuroevolution strategies will also introduce mutations in the number and arrangement of neurons and connections.) Neuroevolution is a meta-algorithm, an algorithm for designing algorithms. And eventually, the algorithms get pretty good at their job.

To test the steppingstone principle, Stanley and his student Joel Lehman tweaked the selection process. Instead of selecting the networks that performed best on a task, novelty search selected them for how different they were from the ones with behaviors most similar to theirs. (In Picbreeder, people rewarded interestingness. Here, as a proxy for interestingness, novelty search rewarded novelty.)

In one test, they placed virtual wheeled robots in a maze and evolved the algorithms controlling them, hoping one would find a path to the exit. They ran the evolution from scratch 40 times. A comparison program, in which robots were selected for how close (as the crow flies) they came to the exit, evolved a winning robot only 3 out of 40 times. Novelty search, which completely ignored how close each bot was to the exit, succeeded 39 times. It worked because the bots managed to avoid dead ends. Rather than facing the exit and beating their heads against the wall, they explored unfamiliar territory, found workarounds, and won by accident. “Novelty search is important because it turned everything on its head,” said Julian Togelius, a computer scientist at New York University, “and basically asked what happens when we don’t have an objective.”

Once Stanley had made his point that the pursuit of objectives can be a hindrance to reaching those objectives, he looked for clever ways to combine novelty search and specific goals. That led him and Lehman to create a system that mirrors nature’s evolutionary niches. In this approach, algorithms compete only against others that are similar to them. Just as worms don’t compete with whales, the system maintains separate algorithmic niches from which a variety of promising approaches can emerge.

Such evolutionary algorithms with localized competition have shown proficiency at processing pixels, controlling a robot arm, and (as depicted on the cover of Nature) helping a six-legged robot quickly adapt its gait after losing a limb, the way an animal would. A key element of these algorithms is that they foster steppingstones. Instead of constantly prioritizing one overall best solution, they maintain a diverse set of vibrant niches, any one of which could contribute a winner. And the best solution might descend from a lineage that has hopped between niches.

Evolved to Win

For Stanley, who is now at Uber AI Labs, the steppingstone principle explains innovation: If you went back in time with a modern computer and told people developing vacuum tubes to abandon them and focus on laptops, we’d have neither. It also explains evolution: We evolved from flatworms, which were not especially intelligent but did have bilateral symmetry. “It’s totally unclear that the discovery of bilateral symmetry had anything to do with intelligence, let alone with Shakespeare,” Stanley said, “but it does.”

Neuroevolution itself has followed an unexpectedly circuitous path over the past decade. For a long time, it has lived in the shadows of other forms of AI.

One of its biggest drawbacks, according to Risto Miikkulainen, a computer scientist at the University of Texas, Austin (and Stanley’s former doctoral adviser), is the amount of computation it requires. In traditional machine learning, as you train a neural network, it gradually gets better and better. With neuroevolution, the weights change randomly, so the network’s performance might degrade before it improves.

Another drawback is the basic fact that most people have a particular problem that they’d like to solve. A search strategy that optimizes for interestingness might get you to a creative solution for that particular problem. But it could lead you astray before it puts you on the right path.

Then again, no strategy is perfect. In the past five years or so, research has exploded in different areas of AI research such as deep learning and reinforcement learning. In reinforcement learning, an algorithm interacts with the environment — a robot navigates the real world, or a player competes in a game — and learns through trial and error which behaviors lead to desired outcomes. Deep reinforcement learning was used by DeepMind to create a program that could beat the world’s best players at Go, a feat that many thought was still years or decades away.

But reinforcement learning can get stuck in a rut. Sparse or infrequent rewards don’t give algorithms enough feedback to enable them to proceed toward their goal. Deceptive rewards — awarded for short-term gains that hinder long-term progress — trap algorithms in dead ends. So while reinforcement learning can whip humans at Space Invaders or Pong — games with frequent points and clear goals — they’ve fallen flat in other classic games that lack those features.

Within the past year, AI based on the steppingstone principle finally managed to crack a number of long-standing challenges in the field.

In the game Montezuma’s Revenge, Panama Joe navigates from room to room in an underground labyrinth, collecting keys to open doors while avoiding enemies and obstacles like snakes and fire pits. To beat the game, Stanley, Lehman, Jeff Clune, Joost Huizinga and Adrien Ecoffet, all five working at Uber AI Labs, developed a system where Panama Joe essentially wanders around and randomly attempts various actions. Each time he reaches a new game state — a new location with a new set of possessions — he files it away in his memory, along with the set of actions he took to get there. If he later finds a faster path to that state, it replaces the old memory. During training, Panama Joe repeatedly picks one of those stored states, explores randomly for a bit, and adds to his memory any new states he finds.

Eventually one of those states is the state of winning the game. And Panama Joe has in his memory all the actions he took to get there. He’s done it with no neural network or reinforcement learning — no rewards for collecting keys or nearing the labyrinth’s end — just random exploration and a clever way to collect and connect steppingstones. This approach managed to beat not only the best algorithms but also the human world record for the game.

The same technique, which the researchers call Go-Explore, was used to beat human experts at Pitfall!, a game where Pitfall Harry navigates a jungle in search of treasure while avoiding crocodiles and quicksand. No other machine learning AI had scored above zero.

Now even DeepMind, that powerhouse of reinforcement learning, has revealed its growing interest in neuroevolution. In January, the team showed off AlphaStar, software that can beat top professionals at the complex video game StarCraft II, in which two opponents control armies and build colonies to dominate a digital landscape. AlphaStar evolved a population of players that competed against and learned from each other. In last week’s Nature paper, DeepMind researchers announced that an updated version of AlphaStar has been ranked among the top 0.2% of active StarCraft II players on a popular gaming platform, becoming the first AI to reach the top tier of a popular esport without any restrictions.

“For a long time with the AlphaStar agents, they were getting better, but they were always exploitable,” said Max Jaderberg, a computer scientist at DeepMind who worked on the project. “You would train an agent, and it would have very, very good performance on average, but you could always train something against this agent and find holes in this agent.”

As in the children’s game rock-paper-scissors, there is no single best game strategy in StarCraft II. So DeepMind encouraged its population of agents to evolve a diversity of strategies — not as steppingstones but as an end in itself. When AlphaStar beat two pros each five games to none, it combined the strategies from five different agents in its population. The five agents had been chosen so that not all of them would be vulnerable to any one opponent strategy. Their strength was in their diversity.

AlphaStar demonstrates one of the main uses of evolutionary algorithms: maintaining a population of different solutions. Another recent DeepMind project shows the other use: optimizing a single solution. Working with Waymo, Alphabet’s autonomous car project, the team evolved algorithms for identifying pedestrians. To avoid getting stuck with an approach that works fairly well, but that isn’t the best possible strategy, they maintained “niches” or subpopulations, so that novel solutions would have time to develop before being crushed by the established top performers.

Population-based algorithms have become more popular in recent years, in part because “they’re a good match to the type of compute that we have now,” said Raia Hadsell, a research scientist and head of robotics at DeepMind, using an industry standard term for computing resources. Hadsell invited Clune, Lehman and Stanley to give a two-hour presentation of their work at the International Conference on Machine Learning in June. “I think it’s an important area of research for AI,” she said, “because it is complementary to the deep learning approaches that have driven the field.”

AI That Designs AI

All of the algorithms discussed so far are limited in their creativity. AlphaStar can only ever come up with new StarCraft II strategies. Novelty search can find novelty within only one domain at a time — solving a maze or walking a robot.

Biological evolution, on the other hand, produces endless novelty. We have bacteria and kelp and birds and people. That’s because solutions evolve, but so do problems. The giraffe is a response to the problem of the tree. Human innovation proceeds likewise. We create problems for ourselves — could we put a person on the moon? — and then solve them.

To mirror this open-ended conversation between problems and solutions, earlier this year Stanley, Clune, Lehman and another Uber colleague, Rui Wang, released an algorithm called POET, for Paired Open-Ended Trailblazer. To test the algorithm, they evolved a population of virtual two-legged bots. They also evolved a population of obstacle courses for the bots, complete with hills, trenches and tree stumps. The bots sometimes traded places with each other, attempting new terrain. For example, one bot learned to cross flat terrain while dragging its knee. It was then randomly switched to a landscape with short stumps, where it had to learn to walk upright. When it was switched back to its first obstacle course, it completed it much faster. An indirect path allowed it to improve by taking skills learned from one puzzle and applying them to another.

POET could potentially design new forms of art or make scientific discoveries by inventing new challenges for itself and then solving them. It could even go much further, depending on its world-building ability. Stanley said he hopes to build algorithms that could still be doing something interesting after a billion years.

Evolution “invented sight, it invented photosynthesis, it invented human-level intelligence, it invented all of it, and all of it in one run of an algorithm,” Stanley said. “To capture one tiny iota of that process I think could be incredibly powerful.”

In a recent paper, Clune argues that open-ended discovery is likely the fastest path toward artificial general intelligence — machines with nearly all the capabilities of humans. Most of the AI field is focused on manually designing all the building blocks of an intelligent machine, such as different types of neural network architectures and learning processes. But it’s unclear how these might eventually get bundled together into a general intelligence.



Instead, Clune thinks more attention should be paid to AI that designs AI. Algorithms will design or evolve both the neural networks and the environments in which they learn, using an approach like POET’s. Such open-ended exploration might lead to human-level intelligence via paths we never would have anticipated — or a variety of alien intelligences that could teach us a lot about intelligence in general. “Decades of research have taught us that these algorithms constantly surprise us and outwit us,” he said. “So it’s completely hubristic to think that we will know the outcome of these processes, especially as they become more powerful and open-ended.”

It may also be hubristic to exert too much control over researchers. A sweet irony to Stanley’s story is that he originally pitched Picbreeder to the National Science Foundation, which rejected his grant application, saying its objective wasn’t clear. But the project has led to papers, talks, a book and a startup — Geometric Intelligence, purchased by Uber to form the core of Uber AI Labs. “To me,” Stanley said, “one of the things that’s really kind of striking and maybe crazy is that the story of how I got here is basically the same as the algorithmic insight that got me here. The thing that led me to the insight is actually explained by the insight itself.”

This article was reprinted on Wired.com and Wired.jp.