Computer Learns To Play Go At Superhuman Levels 'Without Human Knowledge'

Enlarge this image toggle caption DeepMind DeepMind

A year after a computer beat a human world champion in the ancient strategy game of Go, researchers say they have constructed an even stronger version of the program — one that can teach itself without the benefit of human knowledge.

The program, known as AlphaGo Zero, became a Go master in just three days by playing 4.9 million games against itself in quick succession.

"In a short space of time, AlphaGo Zero has understood all of the Go knowledge that has been accumulated by humans over thousands of years of playing," lead researcher David Silver of Google's DeepMind lab said in remarks on YouTube. "Sometimes it's actually chosen to go beyond that and discovered something that the humans hadn't even discovered in this time period."

The work, published this week in the journal Nature, could provide a foundation for machines teaching themselves to solve other complex problems in ways that could be applied to health, for example, or the environment. But some researchers question whether the program actually has such broad applications.

Go is a complex ancient East Asian strategy game, played on a 19-by-19 grid. The open-ended game has more possible configurations than there are known atoms in the universe, according to the DeepMind researchers.

The open-ended nature of the game has made Go a "grand challenge for artificial intelligence," the researchers say. It's far more complicated than chess. A computer beat out world chess champion Garry Kasparov two decades ago.

DeepMind has trained previous versions of the program by giving it a database full of thousands of human-played games of Go. It was one of those versions that went on to beat top player Lee Sedol last year, grabbing international headlines.

AlphaGo Zero takes a different approach.

Instead of learning from human-played games, Silver says it was given the simple rules for Go and asked to play itself. AlphaGo Zero "figures out only for itself, only from self-play, and without any human knowledge, without any human data, without any human examples or features or intervention from humans. It discovers how to play the game of Go completely from first principles."

In their study, the researchers describe the program using a term that is well-known to students of philosophy: Tabula rasa, which is Latin for "blank slate."

They argue that starting with a blank slate is optimal because human data sets can be "expensive, unreliable or simply unavailable." Data sets of human knowledge could also potentially "impose a ceiling on the performance of systems trained in this manner."

As it trained, "what we started to see was that AlphaGo Zero not only started to rediscover the common patterns and openings that humans tend to play," Silver said, "it also learned them, discovered them, and ultimately discarded them in preference for its own variants that humans don't even know about or play at the moment."

When matched with the version that defeated the world champion, AlphaGo Zero beat it 100 games to 0.

The researchers say that the benefit of tabula rasa learning is simple: It means that a program can "learn for itself what knowledge is." This means it could be applied to other fields, they say, such as protein folding or reducing energy consumption.

But other researchers such as Gary Marcus, an entrepreneur and psychology professor at New York University who specializes in artificial intelligence, think that the paper overstates its findings.

The program hasn't mastered Go without human knowledge, he says, because "actually prior knowledge has gone into the construction of the algorithm itself."

He adds: "They're not putting explicit declarative knowledge of things other than the rules of Go in there, but there's a lot of implicit knowledge that the programmers have about how to construct machines to play problems like Go."

Showing that the algorithm can build knowledge from scratch on other kinds of problems would be needed to prove the claim, he says — or else all they've proven is that it's an algorithm that is really good at Go.

In a written statement, DeepMind said that "nothing in the AlphaGo Zero algorithm is specific to the game of Go" and added that the team is "currently applying the same algorithm to other sequential problems and are confident that this approach is generalisable to a large number of domains."

They provided no information about how the algorithm has fared in solving other problems.

Marcus is generally critical of what he sees as a general bias in the AI field toward tabula rasa programming. He argues that "in biology, actual human brains are not tabula rasa ... I don't see the principal theoretical reason why you should do that, why you should abandon lots of knowledge that we have about the world."