exceeds expert human players in many games, e.g., backgammon ( 1 ), checkers ( 2 ), chess ( 3 ),

Jeopardy! ( 4 ), Atari video games ( 5 ), and go ( 6 ). These successes all in volv e games with

information symmetry , where all players ha ve identical information about the current state of

the game. This property of perfect information is also at the heart of the algorithms that enabled

these successes, e.g., local search during play ( 7, 8 ).

The founder of modern game theory and computing pioneer , v on Neumann, en visioned

reasoning in games without perfect information. “Real life is not like that. Real life consists of

bluf ﬁng, of little tactics of deception, of asking yourself what is the other man going to think

I mean to do. And that is what games are about in my theory . ” ( 9 ) One game that fascinated

von Neumann was poker , where players are dealt pri vate cards and take turns making bets or

bluf ﬁng on holding the strongest hand, calling opponents’ bets, or folding and gi ving up on the

hand and the bets already added to the pot. Pok er is a game of imperfect information, where

players’ pri vate cards gi ve them asymmetric information about the state of game.

Heads-up no-limit T exas hold’em (HUNL) is a two-player version of poker in which two

cards are initially dealt face-do wn to each player , and additional cards are dealt face-up in

three subsequent rounds. No limit is placed on the size of the bets although there is an overall

limit to the total amount wagered in each game ( 10 ). AI techniques ha ve previously shown

success in the simpler game of heads-up limit T exas hold’em, where all bets are of a ﬁxed size

resulting in just under 10

14

decision points ( 11 ). By comparison, computers ha ve e xceeded

expert human performance in go ( 6 ), a perfect information game with approximately 10

170

decision points ( 12 ). The imperfect information game HUNL is comparable in size to go, with

the number of decision points exceeding 10

160

( 13 ).

Imperfect information games require more complex reasoning than similarly sized perfect

information games. The correct decision at a particular moment depends upon the probability

distribution ov er pri vate information that the opponent holds, which is re vealed through their

past actions. Howe ver , ho w our opponent’ s actions re veal that information depends upon their

kno wledge of our priv ate information and how our actions re veal it. This kind of recursiv e

reasoning is why one cannot easily reason about game situations in isolation, which is at the

heart of heuristic search methods for perfect information games. Competiti ve AI approaches

in imperfect information games typically reason about the entire game and produce a complete

strategy prior to play ( 14–16 ). Counterfactual regret minimization (CFR) ( 14, 17, 18 ) is one such

technique that uses self-play to do recursiv e reasoning through adapting its strategy against itself

ov er successiv e iterations. If the game is too large to be solved directly , the common response

is to solve a smaller , abstracted game. T o play the original game, one translates situations and

actions from the original game to the abstract game.

Although this approach makes it feasible for programs to reason in a game like HUNL, it

does so by squeezing HUNL’ s 10

160

situations do wn to the order of 10

14

abstract situations.

Likely as a result of this loss of information, such programs are behind expert human play . In

2015, the computer program Claudico lost to a team of professional poker players by a mar gin

of 91 mbb/g ( 19 ), which is a “huge mar gin of victory” ( 20 ). Furthermore, it has been recently

sho wn that abstraction-based programs from the Annual Computer Poker Competition hav e

2