What next for Google’s DeepMind, now that the company has mastered the ancient board game of Go, beating the Korean champion Lee Se-Dol 4–1 this month?

A paper from two UCL researchers suggests one future project: playing poker. And unlike Go, victory in that field could probably fund itself – at least until humans stopped playing against the robot.

The paper’s authors are Johannes Heinrich, a research student at UCL, and David Silver, a UCL lecturer who is working at DeepMind. Silver, who was AlphaGo’s main programmer, has been called the “unsung hero at Google DeepMind”, although this paper relates to his work at UCL.

In the pair’s research, titled “Deep Reinforcement Learning from Self-Play in Imperfect-Information Games”, the authors detail their attempts to teach a computer how to play two types of poker: Leduc, an ultra-simplified version of poker using a deck of just six cards; and Texas Hold’em, the most popular variant of the game in the world.

Applying methods similar to those which enabled AlphaGo to beat Lee, the machine successfully taught itself a strategy for Texas Hold’em which “approached the performance of human experts and state-of-the-art methods”. For Leduc, which has been all but solved, it learned a strategy which “approached” the Nash equilibrium – the mathematically optimal style of play for the game.

As with AlphaGo, the pair taught the machine using a technique called “Deep Reinforcement Learning”. It merges two distinct methods of machine learning: neural networks, and reinforcement learning. The former technique is commonly used in big data applications, where a network of simple decision points can be trained on a vast amount of information to solve complex problems.

But for situations where there isn’t enough data available to accurately train the network, or times when the available data can’t train the network to a high enough quality, reinforcement learning can help. This involves the machine carrying out its task and learning from its mistakes, improving its own training until it gets as good as it can. Unlike a human player, an algorithm learning how to play a game such as poker can even play against itself, in what Heinrich and Silver call “neural fictitious self-play”.

In doing so, the poker system managed to independently learn the mathematically optimal way of playing, despite not being previously programmed with any knowledge of poker. In some ways, Poker is harder even than Go for a computer to play, thanks to the lack of knowledge of what’s happening on the table and in player’s hands. While computers can relatively easily play the game probabilistically, accurately calculating the likelihoods that any given hand is held by their opponents and betting accordingly, they are much worse at taking into account their opponents’ behaviour.

While this approach still cannot take into account the psychology of an opponent, Heinrich and Silver point out that it has a great advantage in not relying on expert knowledge in its creation.

Heinrich told the Guardian: “The key aspect of our result is that the algorithm is very general and learned a game of poker from scratch without having any prior knowledge about the game. This makes it conceivable that it is also applicable to other real-world problems that are strategic in nature.

“A major hurdle was that common reinforcement learning methods focus on domains with a single agent interacting with a stationary world. Strategic domains usually have multiple agents interacting with each other, resulting in a more dynamic and thus challenging problem.”

Heinrich added: “Games of imperfect information do pose a challenge to deep reinforcement learning, such as used in Go. think it is an important problem to address as most real-world applications do require decision making with imperfect information.”

Mathematicians love poker because it can stand in for a number of real-world situations; the hidden information, skewed payoffs and psychology at play were famously used to model politics in the cold war, for instance. The field of Game Theory, which originated with the study of games like poker, has now grown to include problems like climate change and sex ratios in biology.