Doing that will require DeepMind’s software to explore beyond Go’s ordered world of black and white stones. It needs to get to grips with the messy real world—or to begin with a gloomy, pixelated approximation of it. DeepMind’s simulated world is called Labyrinth, and the company is using it to confront its software with increasingly complex tasks, such as navigating mazes. That should push DeepMind’s researchers to learn how to build even smarter software, and push the software to learn how to tackle more difficult decisions and problems. They’re doing this by using the techniques shown off in AlphaGo and earlier DeepMind software that learned to play 1980s-vintage Atari games such as Space Invaders better than a human could. But to succeed, Hassabis will also have to invent his way around some long-standing challenges in artificial intelligence.

Self-improvement

Hassabis, 39, has been working on the question of how to create intelligence for much of his life. A chess prodigy who completed high school early to establish a successful career in the video-game industry, he later got a PhD in neuroscience and published high-profile research on memory and imagination.

Hassabis cofounded DeepMind in 2010 to transfer some of what he learned about biological intelligence to machines. The company revealed software that learned to master Atari games in December 2013, and early in 2014 it was purchased by Google for an amount reported to be 400 million pounds, more than $600 million at the time (see “Google’s Intelligence Designer”). DeepMind quickly expanded, hiring dozens more researchers and publishing scores of papers in leading machine-learning and artificial-intelligence conferences. This January it revealed the existence of AlphaGo, and that it had defeated Europe’s best Go player in October 2015. AlphaGo beat an 18-time world champion, Lee Sedol, earlier this month (see “Five Lessons from AlphaGo’s Historic Victory”).

Demis Hassabis leads a group inside Google aiming to "solve intelligence."

Atari games and Go are very different, but DeepMind tackled them both using the same approach, loosely inspired by the way animals can be taught new tricks using rewards and punishments from a trainer. In reinforcement learning, as it is called, software is programmed to explore a new environment and adjust its behavior to increase some kind of virtual reward.

DeepMind’s Atari software, for example, was programmed only with the ability to control and see the game screen, and an urge to increase the score. For dozens of titles, a few hours of practice is enough for the software to pull itself up by its own bootstraps and beat a human expert.

AlphaGo combines reinforcement learning with other components, such as a system that learned to evaluate possible moves by analyzing tens of millions of board positions from games by expert Go players, and a search mechanism that selects the most promising moves. But it was reinforcement learning that enabled AlphaGo to whip itself into world-champion-beating shape by playing against itself millions of times.

Hassabis believes the reinforcement learning approach is the key to getting machine-learning software to do much more complex things than the tricks it performs for us today, such as transcribing our words, or understanding the content of photos. “We don't think just observing is enough for intelligence, you also have to act,” he says. “Ultimately that’s the only way you can really understand the world.”

DeepMind’s 3-D environment Labyrinth, built on an open-source clone of the first-person-shooter Quake, is designed to provide the next steps in proving that idea. The company has already used it to challenge agents with a game in which they must explore randomly generated mazes for 60 seconds, winning points for collecting apples or finding an exit (which leads to another randomly generated maze). Future challenges might require more complex planning—for example, learning that keys can be used to open doors. The company will also test software in other ways, and is considering taking on the video game Starcraft and even poker. But posing harder and harder challenges inside Labyrinth will be a major thread of research for some time, says Hassabis. “It should be good for the next couple of years,” he says.

Other companies and researchers working on artificial intelligence will be watching closely. The success of DeepMind’s reinforcement learning has surprised many machine-learning researchers. The technique was established in the 1980s, and has not proved to be as widely useful or very powerful as other ways of training software, says Pedro Domingos, a professor who works on machine learning at the University of Washington. DeepMind strengthened the venerable technique by combining it with a method called deep learning, which has recently produced big advances in how well computers can decode information such as images and triggered a recent boom in machine-learning technology (see “10 Breakthrough Technologies 2013: Deep Learning”).

“What DeepMind has done is impressive,” says Domingos. But he also says it is too early to say whether what Hassabis thinks is a rocket engine that can fly far beyond today’s results isn’t in fact a backyard firework—the recent string of impressive results may not last. “Demis’s optimism about reinforcement learning is not justified by its track record so far,” says Domingos. “Progress is not linear in machine learning and artificial intelligence; we have spurts of progress and then long periods of slow progress.”

Hassabis acknowledges that “a lot” of people in his field doubt reinforcement learning’s potential, but says they will be won over. “The further we go with this, the more we feel our thesis is correct, and I think we’re changing the entire field,” he says. “In our view reinforcement learning is going to be as big as deep learning in the next two or three years.”

Safety first

DeepMind’s results so far may justify Hassabis’s claim that reinforcement learning will soon find many useful applications. AlphaGo’s victory surprised professional Go players and computer scientists because the game is too complex to be tackled by software that primarily relies on calculating the possible outcomes of different moves, the method that IBM’s DeepBlue used to defeat world chess champion Garry Kasparov in 1997. On average a chess player has 35 possible moves every turn; in Go there are 250. There are more possible Go positions than there are atoms in the universe. “Chess is a calculation game,” says Hassabis. “Go is too complex, so players are using their intuition. It’s totally different in class. You can think of AlphaGo as superhuman intuition instead of superhuman calculation.”

World Go champion Lee Sedol reviews a game during his 4-1 series defeat to DeepMind’s AlphaGo software.

Whether or not you’d agree that AlphaGo exhibits intuition, enabling software to master more complex tasks could clearly be useful. DeepMind is working with the U.K.’s National Health Service on a project aimed at training software to help medical staff to spot signs of kidney problems that are commonly missed and cause large numbers of avoidable deaths. The group is also working with business divisions of Google, where, Hassabis says, his technology could surface in virtual assistants or improve recommendation systems, which are crucial to products such as YouTube (similar systems also power some of Google’s advertising products).

Looking farther ahead, DeepMind will need many breakthroughs to keep moving toward Hassabis’s goal of solving intelligence, even in the next couple of years of experimenting inside Labyrinth. One of the most critical missing pieces is a trick called chunking that human and animal brains use to handle the world’s complexities. Hassabis explains it using the example of needing to go to the airport. You can conceive of how you’ll get there and carry out that plan without having to consider exactly where to place your feet as you walk to the door, how to turn its handle or control every twitch of your muscle fibers. We can plan and take actions by working with high-level concepts that hide many details, and adapt to new situations by recombining the “chunks,” or concepts, we already know. “It's probably one of the most core problems left in AI,” says Hassabis.

It's a problem being worked on by many research groups, including others inside Google. But one unusual way DeepMind hopes to solve it is by studying real brains. The company has a team of neuroscientists led by a prominent researcher, Matthew Botvinick, who until late last year was a Princeton professor. Unlike most neuroscience research, its experiments are aimed as much at informing how DeepMind designs software as revealing how the brain works.

One recent experiment tested a theory of Hassabis’s about the way human brains organize concepts, using a standard procedure that creates false memories. It involves presenting test subjects with a list of related words, for example “cold,” “snow,” and “ice.” People often falsely remember hearing other related words, too, such as “winter.”

DeepMind employees during the match with Sedol in Seoul earlier this month.

“With my machine-learning hat on I thought that has to be a huge clue as to how that kind of conceptual information is organized in the brain,” says Hassabis. The DeepMind team worked out a theory of how the brain’s anterior temporal lobe works with concepts, and confirmed its predictions by watching the brains of people doing the memory task inside a scanner. The results might help change how DeepMind designs its artificial neural networks to represent information.

Other things on DeepMind’s “to discover” list include a way to combine research it has done on software to grasp the meaning of text with its work on agents that roam inside Labyrinth—one possibility is to start putting up signs inside the virtual space. Hassabis says he’s also planning an “ambitious” way to test agents when they are ready for a more realistic world than Labyrinth. At some point he wants to see DeepMind software take control of robots, which he says are held back by the inability of software to understand the world. “There are amazing robots around that cannot be used to their full capabilities because the algorithms aren't there,” he says.

Success could raise some tough philosophical and ethical questions about what it means to be human and the acceptable use cases of artificial intelligence. Hassabis says he encourages discussion of the possible risks of the technology. (Although he also notes with satisfaction that physicist Stephen Hawking has stopped warning that artificial intelligence could wipe out humans since meeting with Hassabis; Tesla founder Elon Musk, who has likened artificial intelligence research to “summoning the demon,” has also received an anti-pep talk.) DeepMind has an internal ethics board of philosophers, lawyers, and businesspeople. Hassabis says their names will probably be disclosed “shortly,” and that he's also working to convene a similar, external, board shared across multiple computing companies.

DeepMind’s engineers don’t yet need ethics advice when planning new experiments, though, says Hassabis. “We're nowhere near anything we would be worried about,” he says. “It's more about getting everyone up to speed.” If everything works out as Hassabis hopes, his ethics board will eventually have real work to do.