Our winning days are numbered urbancow/Getty

NOT so long ago, mastering the ancient Chinese game of Go was beyond the reach of artificial intelligence. But then AlphaGo, Google DeepMind’s AI player, started to leave even the best human opponents in the dust. Yet even this world-beating AI needed humans to learn from. Then, on Wednesday, DeepMind’s new version ditched people altogether.

AlphaGo Zero has surpassed its predecessor’s abilities, bypassing AI’s traditional method of learning games, which involves watching thousands of hours of human play. Instead, it simply starts playing at random, honing its skills by repeatedly playing against itself. Three days and 4.9 million such games later, the result is the world’s best Go-playing AI.

“It’s more powerful than previous approaches because we’ve removed the constraints of human knowledge,” says David Silver, the lead researcher for AlphaGo.


“Humankind has accumulated Go knowledge from millions of games played over thousands of years,” the authors write in their paper. “In the space of a few days… AlphaGo Zero was able to rediscover much of this Go knowledge, as well as novel strategies that provide new insights into the oldest of games.”

Read more: How scaremongering stops us asking the right questions about AI



AlphaGo Zero’s alternative approach has allowed it to discover strategies humans have never found. For example, it learned many different josekis – sequences of moves that result in no net loss for either side. Plenty of josekis have been written down during the thousands of years Go has been played, and initially AlphaGo Zero learned many of the familiar ones. But as its self-training continued, it started to favour previously unknown sequences.

To test these new moves, DeepMind pitted AlphaGo Zero against the version that beat 18-time world champion Lee Sedol. In a 100-game grudge match, it won 100-0. This is despite only training for three days, compared to several months for its predecessor. After 40 days of training, it also won 89-11 against a better version of AlphaGo that had defeated world number one Ke Jie (Nature, DOI: 10.1038/nature24270).

“It’s more powerful because we have removed the constraints of human knowledge”

DeepMind hopes this method will have applications beyond Go. “The team are already working to apply this to scientific problems like protein-folding,” said CEO Demis Hassabis at a press conference on Monday. Climate science, drug discovery and quantum chemistry could also benefit, he said.

This approach might also solve one of the thorniest issues that has faced AI: the need for copious training data. “With this approach you no longer have to rely on getting expert quality human data,” says David Churchill at Memorial University, Canada.

Yet there are drawbacks too. For an AI to learn by itself, it needs to be programmed with the rules of the world it inhabits. That works for worlds with clear and simple rules, but would quickly become impossible for more complicated tasks like driving.

Even in cases for which the rules are clear, AlphaGo Zero’s abilities may not transfer. Although Go is a challenging game, it still has many attributes that tailor it to conquest by AI systems.

So although DeepMind has now created the world’s best Go player twice, it will have a tougher task proving that the same approach can be useful beyond board games. “In 10 years, I hope that these kinds of algorithms will be routinely advancing the frontiers of scientific research,” says Hassabis.

This article appeared in print under the headline “Go-playing super AI transcends humanity”