At the end of January, Carnegie Mellon computer scientists achieved a major milestone: their algorithm, Libratus, beat a set of professional poker players in a 120,000-hand tournament. While humans have fallen to computers in a variety of games, notably chess and go, poker is fundamentally different, in that each player has information that's not available to the rest. A fundamentally different sort of AI is required to deal with this sort of imperfect information.

This week in Science, a different team described its human-beating poker algorithm, DeepStack. Both teams say their approach isn't specific to poker, so 2017 may mark the end of human dominance at all imperfect-information games.

Imperfect strategies

A perfect information game is relatively simple: all players can know the full state of the game, often just by looking at the board. They also know the full set of legal rules. So it's relatively trivial to calculate all the possible moves available given any specific board. With enough computing power, it's also possible to calculate all possibilities many moves out—enough to effectively bring any game to a conclusion. In the case of a simple game like checkers, this means all possible future moves. For something more complicated like chess, calculations may effectively be limited to 10 moves ahead.

If a computer can assign values to each possible future board, then it becomes trivial to make an optimal move in any situation. At that point, the best a human player can hope for is a draw. Poker is fundamentally different. In every variant of the game I'm aware of, there are cards that aren't visible to other players, both in each other's hands and waiting unplayed in the deck. This creates a far larger computational challenge.

To handle imperfect information games, past work has focused on approaches derived from game theory. Here, computers choose a "strategy" and calculate how likely they are to regret using it as the game progresses. An ideal approach means that other players won't be able to consistently exploit weaknesses in that strategy to win money. That doesn't mean that the computer will win every hand—some deals are just impossible to work with—just that it becomes hard to find ways to consistently come out ahead in the long run.

So, the poker playing AIs need to both calculate how all its strategies work given a specific game situation (cards and bet history) and be able to pick the appropriate one given each game it finds itself in.

For Libratus, this involved a lot of pre-computation and then daily updates as the poker tournament continued. While the human players discussed any strategic weaknesses they'd found during the day's games, the Libratus team had access to a Petaflop of computational hardware that they used to patch those weaknesses. “After play ended each day, a meta-algorithm analyzed what holes the pros had identified and exploited in Libratus’ strategy,” said Carnegie Mellon's Tuomas Sandholm. “It then prioritized the holes and algorithmically patched the top three using the supercomputer each night."

The pros could tell. "Every time we find a weakness, it learns from us and the weakness disappears the next day,” said human opponent Jimmy Chou. The end result was a sizable stomping, with Libratus coming out over $1.75 million ahead.

Stacking it deep

DeepStack comes from a collaboration between some Czech researchers and the team that first figured out an algorithmic approach to limit Texas hold’em. As with Libratus, it's a general approach to solving imperfect information games. But here, details of the computational approach are very different: it plays effectively by treating each turn of a card as a completely new game.

The paper on DeepStack describes why it's hard to try to use the whole history of the game effectively:

The correct decision at a particular moment depends upon the probability distribution over private information that the opponent holds, which is revealed through their past actions. However, how our opponent’s actions reveal that information depends upon their knowledge of our private information and how our actions reveal it. This kind of recursive reasoning is why one cannot easily reason about game situations in isolation.

To avoid getting stuck in an infinite recursion, DeepStack simply forgets the past. "Our goal is to avoid ever maintaining a strategy for the entire game," its developers write. Instead, each time DeepStack needs to act, it performs a quick search to pick out a strategy based on the current state of the game. That search relies on two major simplifications.

The first is that it only considers a limited number of options. It can fold, call, go all-in, or make only two or three different bets. These limit the future states that have to be considered rather considerably—by about 140 orders of magnitude. It also doesn't search forward to all possible positions. As a result, the computation of which action to take runs about five seconds on a single Nvidia GeForce GTX 1080.

All of this work still requires a lookup of the values of possible future hands. These were done using a Deep Learning neural network, or rather, two copies of the same network: one for the first three shared cards, the second for the final two. The networks were trained on 10 million randomly drawn poker games.

To test this out, the team recruited 33 players via the International Federation of Poker to play head-to-head. The monetary prizes weren't enough to draw in the best players out there, and some of them only completed a handful of games. Still, only two of the players ended up ahead of DeepStack, and both of those played a limited number of games, where the chance draw of the cards could have an inordinate effect. Of the 11 players who played a full 3,000-game match, all ended up down to DeepStack, 10 of them by a statistically significant margin.

Because the approaches are so different, there's a chance that some of this work can be merged if the two teams decided to join forces. Still, the DeepStack approach appears to be more general, since it doesn't rely on having a supercomputer at your disposal to update the system during breaks.

But the key thing will be to see if this software can be extended beyond games. Both teams claim to have made a general approach to imperfect knowledge situations; for DeepStack, the poker-specific portions of the code seem to be the neural networks that computed the value of future game states and the decision of which action to take. If they can be swapped out, it might be possible to use the software for real-world problems. Its authors specifically mention medical and defense decisions as being amenable to this sort of evaluation.

Still, the next step may be to simply get this software to play when there's more than one opponent. Both were designed to face single players one-on-one. Adding a full table of players would up the complexity and re-up the computational challenge.

Science, 2017. DOI: 10.1126/science.aam6960 (About DOIs).