As a new Nature paper points out, “There are an astonishing 10 to the power of 170 possible board configurations in Go—more than the number of atoms in the known universe.” (Image: DeepMind)

Remember AlphaGo, the first artificial intelligence to defeat a grandmaster at Go? Well, the program just got a major upgrade, and it can now teach itself how to dominate the game without any human intervention. But get this: In a tournament that pitted AI against AI, this juiced-up version, called AlphaGo Zero, defeated the regular AlphaGo by a whopping 100 games to 0, signifying a major advance in the field. Hear that? It’s the technological singularity inching ever closer.




A new paper published in Nature today describes how the artificially intelligent system that defeated Go grandmaster Lee Sedol in 2016 got its digital ass kicked by a new-and-improved version of itself. And it didn’t just lose by a little—it couldn’t even muster a single win after playing a hundred games. Incredibly, it took AlphaGo Zero (AGZ) just three days to train itself from scratch and acquire literally thousands of years of human Go knowledge simply by playing itself. The only input it had was what it does to the positions of the black and white pieces on the board. In addition to devising completely new strategies, the new system is also considerably leaner and meaner than the original AlphaGo.

Lee Sedol getting crushed by AlphaGo in 2016. (Image: AP)


Now, every once in a while the field of AI experiences a “holy shit” moment, and this would appear to be one of those moments. Looking back, other “holy shit” moments include Deep Blue defeating Garry Kasparov at chess in 1997, IBM’s Watson defeating two of the world’s best Jeopardy! champions in 2011, the aforementioned defeat of Lee Sedol in 2016, and most recently, the defeat of four professional no-limit Texas hold’em poker players at the hands of Libratus, an AI developed by computer scientists at Carnegie Mellon University.



This latest achievement qualifies as a “holy shit” moment for a number of reasons.

First of all, the original AlphaGo had the benefit of learning from literally thousands of previously played Go games, including those played by human amateurs and professionals. AGZ, on the other hand, received no help from its human handlers, and had access to absolutely nothing aside from the rules of the game. Using “reinforcement learning,” AGZ played itself over and over again, “starting from random play, and without any supervision or use of human data,” according to the Google-owned DeepMind researchers in their study. This allowed the system to improve and refine its digital brain, known as a neural network, as it continually learned from experience. This basically means that AlphaGo Zero was its own teacher.

“This technique is more powerful than previous versions of AlphaGo because it is no longer constrained by the limits of human knowledge,” notes the DeepMind team in a release. “Instead, it is able to learn tabula rasa [from a clean slate] from the strongest player in the world: AlphaGo itself.”


Image: AP

When playing Go, the system considers the most probable next moves (a “policy network”), and then estimates the probability of winning based on those moves (its “value network”). AGZ requires about 0.4 seconds to make these two assessments. The original AlphaGo was equipped with a pair of neural networks to make similar evaluations, but for AGZ, the Deepmind developers merged the policy and value networks into one, allowing the system to learn more efficiently. What’s more, the new system is powered by four tensor processing units (TPUS)—specialized chips for neural network training. Old AlphaGo needed 48 TPUs.


After just three days of self-play training and a total of 4.9 million games played against itself, AGZ acquired the expertise needed to trounce AlphaGo (by comparison, the original AlphaGo had 30 million games for inspiration). After 40 days of self-training, AGZ defeated another, more sophisticated version of AlphaGo called AlphaGo “Master” that defeated the world’s best Go players and the world’s top ranked Go player, Ke Jie. Earlier this year, both the original AlphaGo and AlphaGo Master won a combined 60 games against top professionals. The rise of AGZ, it would now appear, has made these previous versions obsolete.



“The time when humans can have a meaningful conversation with an AI has always seemed far off and the stuff of science fiction. But for Go players, that day is here.”


This is a major achievement for AI, and the subfield of reinforcement learning in particular. By teaching itself, the system matched and exceeded human knowledge by an order of magnitude in just a few days, while also developing unconventional strategies and creative new moves. For Go players, the breakthrough is as sobering as it is exciting; they’re learning things from AI that they could have never learned on their own, or would have needed an inordinate amount of time to figure out.

“[AlphaGo Zero’s] games against AlphaGo Master will surely contain gems, especially because its victories seem effortless,” wrote Andy Okun and Andrew Jackson, members of the American Go Association, in a Nature News and Views article. “At each stage of the game, it seems to gain a bit here and lose a bit there, but somehow it ends up slightly ahead, as if by magic... The time when humans can have a meaningful conversation with an AI has always seemed far off and the stuff of science fiction. But for Go players, that day is here.”


No doubt, AGZ represents a disruptive advance in the world of Go, but what about its potential impact on the rest of the world? According to Nick Hynes, a grad student at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), it’ll be a while before a specialized tool like this will have an impact on our daily lives.

“So far, the algorithm described only works for problems where there are a countable number of actions you can take, so it would need modification before it could be used for continuous control problems like locomotion [for instance],” Hynes told Gizmodo. “Also, it requires that you have a really good model of the environment. In this case, it literally knows all of the rules. That would be as if you had a robot for which you could exactly predict the outcomes of actions—which is impossible for real, imperfect physical systems.”


The nice part, he says, is that there are several other lines of AI research that address both of these issues (e.g. machine learning, evolutionary algorithms, etc.), so it’s really just a matter of integration. “The real key here is the technique,” says Hynes.

“It’s like an alien civilization inventing its own mathematics which allows it to do things like time travel...Although we’re still far from ‘The Singularity,’ we’re definitely heading in that direction.”


“As expected—and desired—we’re moving farther away from the classic pattern of getting a bunch of human-labeled data and training a model to imitate it,” he said. “What we’re seeing here is a model free from human bias and presuppositions: It can learn whatever it determines is optimal, which may indeed be more nuanced that our own conceptions of the same. It’s like an alien civilization inventing its own mathematics which allows it to do things like time travel,” to which he added: “Although we’re still far from ‘The Singularity,’ we’re definitely heading in that direction.”

Noam Brown, a Carnegie Mellon University computer scientist who helped to develop the first AI to defeat top humans in no-limit poker, says the DeepMind researchers have achieved an impressive result, and that it could lead to bigger, better things in AI.


“While the original AlphaGo managed to defeat top humans, it did so partly by relying on expert human knowledge of the game and human training data,” Brown told Gizmodo. “That led to questions of whether the techniques could extend beyond Go. AlphaGo Zero achieves even better performance without using any expert human knowledge. It seems likely that the same approach could extend to all perfect-information games [such as chess and checkers]. This is a major step toward developing general-purpose AIs.”

As both Hynes and Brown admit, this latest breakthrough doesn’t mean the technological singularity—that hypothesized time in the future when greater-than-human machine intelligence achieves explosive growth—is imminent. But it should cause pause for thought. Once we teach a system the rules of a game or the constraints of a real-world problem, the power of reinforcement learning makes it possible to simply press the start button and let the system do the rest. It will then figure out the best ways to succeed at the task, devising solutions and strategies that are beyond human capacities, and possibly even human comprehension.


As noted, AGZ and the game of Go represent an oversimplified, constrained, and highly predictable picture of the world, but in the future, AI will be tasked with more complex challenges. Eventually, self-teaching systems will be used to solve more pressing problems, such as protein folding to conjure up new medicines and biotechnologies, figuring out ways to reduce energy consumption, or when we need to design new materials. A highly generalized self-learning system could also be tasked with improving itself, leading to artificial general intelligence (i.e. a very human-like intelligence) and even artificial superintelligence.

As the DeepMind researchers conclude in their study, “Our results comprehensively demonstrate that a pure reinforcement learning approach is fully feasible, even in the most challenging of domains: it is possible to train to superhuman level, without human examples or guidance, given no knowledge of the domain beyond basic rules.”


And indeed, now that human players are no longer dominant in games like chess and Go, it can be said that we’ve already entered into the era of superintelligence. This latest breakthrough is the tiniest hint of what’s still to come.

[Nature]