Left: An unmodified walker learning how to navigate challenging terrain. Right: A self-modified walker, using new limbs and walking strategy, working through the same obstacle course. Gif : David Ha/Google Brain/Gizmodo

Using a technique called reinforcement learning, a researcher at Google Brain has shown that virtual robots can redesign their body parts to help them navigate challenging obstacle courses—even if the solutions they come up with are completely bizarre.




Embodied cognition is the idea that an animal’s cognitive abilities are influenced and constrained by its body plan. This means a squirrel’s thought processes and problem- solving strategies will differ somewhat from the cogitations of octopuses, elephants, and seagulls. Each animal has to navigate its world in its own special way using the body it’s been given, which naturally leads to different ways of thinking and learning.

“Evolution plays a vital role in shaping an organism’s body to adapt to its environment,” David Ha, a computer scientist and AI expert at Google Brain, explained in his new study. “The brain and its ability to learn is only one of many body components that is co-evolved together.”


This phenomenon has been observed in the real world, but Ha wanted to know if similar processes might also apply to the digital realm. To that end, Ha conducted a series of experiments to see if reinforcement learning could coax virtual robots, called walkers, into designing their body plans to better accommodate their environment and the challenges confronting them. Reinforcement learning is a tool used in artificial intelligence to steer agents toward a desired goal or direction, by awarding them points for “good” behavior.

Using the OpenAI Gym framework, Ha was able to provide an environment for his walkers. This framework looks a lot like an old- school, 2D video game, but it uses sophisticated virtual physics to simulate natural conditions, and it’s capable of randomly generating terrain and other in-game elements.

As for the walker, it was endowed with a pair of legs, each consisting of an upper and lower section. The bipedal bot had to learn how to navigate through its virtual environment and improve its performance over time. Researchers at DeepMind conducted a similar experiment last year, in which virtual bots had to learn how to walk from scratch and navigate through complex parkour courses. The difference here is that Ha’s walkers had the added benefit of being able to redesign their body plan—or at least parts of it. The bots could alter the lengths and widths of their four leg sections to a maximum of 75 percent of the size of the default leg design. The walkers’ pentagon-shaped head could not be altered, serving as cargo. Each walker used a digital version of LIDAR to assess the terrain immediately in front of it , which is why (in the videos) they appear to shoot a thin laser beam at regular intervals.

Using reinforcement-learning algorithms, the bots were given around a day or two to devise their new body parts and come up with effective locomotion strategies, which together formed a walker’s “policy,” in the parlance of AI researchers. The learning process is similar to trial-and-error, except the bots, via reinforcement learning, are rewarded when they come up with good strategies, which then leads them toward even better solutions. This is why reinforcement learning is so powerful—it speeds up the learning process as the bots experiment with various solutions, many of which are unconventional and unpredictable by human standards.


Left: An unmodified walker joyfully skips through easy terrain. Right: With training, a self-modified walker chose to hop instead. Gif : David Ha/Google Brain/Gizmodo

For the first test (above), Ha placed a walker in a basic environment with no obstacles and gently rolling terrain. Using its default body plan, the bot adopted a rather cheerful-looking skipping locomotion strategy. After the learning stage, however, it modified its legs such that they were thinner and longer. With these modified limbs, the walker used its legs as springs, quickly hopping across the terrain.




The walker chose a strange body plan and an unorthodox locomotion strategy for traversing challenging terrain. Gif : David Ha/Google Brain/Gizmodo

The introduction of more challenging terrain (above), such as having to walk over obstacles, travel up and down hills , and jump over pits, introduced some radical new policies, namely the invention of an elongated rear “tail” with a dramatically thickened end. Armed with this configuration, the walkers hopped successfull y around the obstacle course.




By this point in the experiment, Ha could see that reinforcement learning was clearly working. Allowing a walker “to learn a better version of its body obviously enables it to achieve better performance,” he wrote in the study.

Not content to stop there, Ha played around with the idea of motivating the walkers to adopt some design decisions that weren’t necessarily beneficial to its performance. The reason for this, he said, is that “we may want our agent to learn a design that utilizes the least amount of materials while still achieving satisfactory performance on the task.”


The tiny walker adopted a very familiar gait when faced with easy terrain. Gif : David Ha/Google Brain/Gizmodo

So for the next test, Ha rewarded an agent for developing legs that were smaller in area (above). With the bot motivated to move efficiently across the terrain, and using the tiniest legs possible (it no longer had to adhere to the 75 percent rule), the walker adopted a rather conventional bipedal style while navigating the easy terrain (it needed just 8 percent of the leg area used in the original design).


The walker struggled to come up with an effective body plan and locomotion style when it was rewarded for inventing small leg sizes. Gif : David Ha/Google Brain/Gizmodo

But the walker really struggled to come up with a sensible policy when having to navigate the challenging terrain. In the example shown above, which was the best strategy it could muster, the walker used 27 percent of the area of its original design. R einforcement learning is good, but it’s no guarantee that a bot will come up with something brilliant. In some cases, a good solution simply doesn’t exist.


“By allowing the agent’s body to adapt to its task within some constraints, it can learn policies that are not only better for its task, but also learn them more quickly,” wrote Ha in the paper. His experiment showed that embodied cognition can apply to the virtual realm, and that agents can be motivated to devise body structures more suitable for a given task.

More practically, this application of reinforcement learning could be used for machine learning- assisted design, in which computers are tasked with designing aerodynamic shapes, testing materials under stressful conditions, or building super-agile robots (the corporeal kind). It could also help with computer graphics and improved video gameplay— imagine having to face off against an AI-enabled adversary that can continually redesign itself as it learns from its mistakes and your strengths .


Best of all, reinforcement learning requires minimal human intervention. Sure, many of the solutions conceived by these virtual bots are weird and even absurd, but that’s kind of the point. As the abilities of these self-learning systems increase in power and scope, they’ll come up with things humans never would have thought of . Which is actually kind of scary.

[Google Brain via New Scientist]