“The most striking thing is we don’t need any human data anymore,” says Demis Hassabis, CEO and cofounder of DeepMind. Hassabis says the techniques used to build AlphaGo Zero are powerful enough to be applied in real-world situations where it’s necessary to explore a vast landscape of possibilities, including drug discovery and materials science. The research behind AlphaGo Zero is published today in the journal Nature.

Remarkably, during this self-teaching process AlphaGo Zero discovered many of the tricks and techniques that human Go players have developed over the past several thousand years. “A few days in, it rediscovers known best plays, and in the final days goes beyond those plays to find something even better,” Hassabis says. “It’s quite cool to see.”

DeepMind, based in London, was acquired by Google in 2014. The company is focused on making big strides in AI using game play, simulation, and machine learning; it has hired hundreds of AI researchers in pursuit of this goal. Developing AlphaGo Zero involved around 15 people and probably millions of dollars’ worth of computing resources, Hassabis says.

Both AlphaGo and AlphaGo Zero use a machine-learning approach known as reinforcement learning (see “10 Breakthrough Technologies 2017: Reinforcement Learning”) as well as deep neural networks. Reinforcement learning is inspired by the way animals seem to learn through experimentation and feedback, and DeepMind has used the technique to achieve superhuman performance in simpler Atari games.

The number of possible configurations on the Go board is greater than the number of atoms in the universe. www.alphagomovie.com

Mastering the board game Go was especially significant, however, because the game is so complex and because the best players make their moves so instinctively. The rules of good play, in other words, cannot easily be explained or written in code.

Reinforcement learning also shows promise for automating the programming of machines in many other contexts, including those where it would be impractical to program them by hand. It is already being tested as a way to teach robots to grasp awkward objects, for example, and as a means of conserving energy in data centers by reconfiguring hardware on the fly. In many real-world situations, however, there may not be a large number of examples to learn from, meaning machines will have to learn for themselves. That’s what makes AlphaGo Zero interesting.

“By not using human data or human expertise, we’ve actually removed the constraints of human knowledge,” says David Silver, the lead researcher at DeepMind and a professor at University College London. “It’s able to create knowledge for itself from first principles.”

To achieve Go supremacy, AlphaGo Zero simply played against itself, randomly at first. Like the original, it used a deep neural network and a powerful search algorithm to pick the next move. But in AlphaGo Zero, a single neural network took care of both functions.

Martin Mueller, a professor at the University of Alberta in Canada who has done important work on Go-playing software, is impressed by the design of AlphaGo Zero and says it advances reinforcement learning. “The architecture is simpler, yet more powerful, than previous versions,” he says.

DeepMind is already the darling of the AI industry, and its latest achievement is sure to grab headlines and spark debate about progress toward much more powerful forms of AI.

There are reasons to take the announcement cautiously, though. Pedro Domingos, a professor at the University of Washington, points out that the program still needs to play many millions of games in order to master Go—many more than an expert human player does. This suggests that the intelligence the program employs is fundamentally different somehow.