Aces High: Numerical Techniques in Poker AI

By Simon Tomlinson

[Game programmer Simon Tomlinson (Need For Speed: Shift) analyzes techniques for working with poker AI in this advanced technical article, offering up tricks for generating AI behavior in the well-studied game.]

This article looks at the use of numerical techniques with application to poker AI, both in off-line balancing and in-game decision making. If you are not familiar with No Limits Texas Hold'Em Poker, the game rules and basic AI structure are discussed in a companion article -- Poker AI: A Starting Point, which is live now on Gamasutra's sister site, GameCareerGuide.

However, two significant areas were not described there in full: how to calculate hand win probabilities, and how to balance the AI characters which are essential to realistic game play. Both problems can be attacked using numerical optimization techniques. Indeed these techniques can be used in a wide range of AI balancing and live decision making applications.

Numerical Techniques Overview

There are three broad classes of numerical computing techniques: integration, root finding and optimization. Integration is usually associated with physics engines -- primarily integrating equations of motion over time. Root finding is the process of discovering the zero values in a function of one or more dependent parameters.

Optimization is finding the minimum (or sometimes maximum) of a value which is a function of one or more parameters. It is important to realize that the numerical problem need not be a real life physical equation -- it could be a parametric model that has developed to represent some situation. Here we are concerned mainly with optimization.

There are many well-known solution algorithms for numerical problems. Root finding and optimization problems generally proceed by calculating an approximation to the solution which is an improvement of a previous guess, and repeating the cycle (or iteration) until some acceptable accuracy level is reached (convergence).

These methods are further sub-divided in terms of the "order" of the solution. Higher order solutions use derivatives of the problem formula in order to calculate the next solution approximation more accurately -- thus higher order algorithms converge in fewer iterations.

However there is a trade off: the cost of calculating the derivatives could be higher overall than using a lower order algorithm over more iterations. This can be mitigated by using hybrid techniques where higher order derivatives are approximated using partial differences -- the difference between two previous formula calculations divided by the known change in the dependent parameter.

It should also be noted that an optimization problem can always be converted to a root finding problem by differentiation which can in some cases make it easier to solve. Of course when the equation has many dependent parameters the problem becomes significantly more complex.

By now some readers might be getting a little worried about using these techniques due to their complexity. However there is hope. There are some very simple zero order methods which are easy to understand, simple to implement and which rely on using modern computing power to solve problems in acceptable timescales.

The simplest conceptual technique is to change each parameter by a small amount and evaluate the function, keeping the change if the new value is closer to the target (root, minimum or maximum) and discarding it otherwise. However we can easily do better than that.

The "Monte Carlo Technique" (MCT) relies on random numbers and the statistics of "random walks" to progress towards the solution on a timescale significantly shorter than the step by step search. Essentially this technique chooses one or more parameters from the set at random and changes them by small random amounts before re-calculating the function and retaining improvements.

Genetic Algorithms can be thought of as an extension of this technique where more than one current solution approximation is stored in memory (the parents) and a group of new candidate solutions (the children) are generated either by randomly mixing the parents or adding in further random "mutations". All the children are evaluated and the best two or more are retained as parents for the next generation.

This is an improvement over MCT because it is keeping open more options per cycle and therefore allowing for the existence of clusters of near optimum parameters in each group but there are special considerations for setting up a GA to maximize efficiency.

Shaping the Problem Space

Before building a solution it is worth considering the nature of the problem being solved carefully. In many cases good design here can make the numerical algorithm far more efficient. The biggest single consideration is the number of parameters. The fewer the number of parameters the faster the algorithm will converge.

If the number of parameters is very large higher order algorithms can become untenable because of the sheer number of partial derivative permutations. Constructing the parameter space efficiently for numerical methods can often affect the design of the whole AI system, so think about this early. I usually try to use a one dimensional array of real values, which can be copied, or re-interpreted through the use of a Union. This keeps the numerical algorithm generic.

It is also useful to pre-scale the parameters so that they all have a similar magnitude. This helps because the solution algorithm can be more agnostic as to the details of the problem function and therefore apply the same parameter change magnitude to any parameter.

More importantly the numerical solution is more stable where similar changes in the input parameters produce a similar change in the function value. To understand this think about the converse case -- if one parameter produces only a tiny change in the function for a large change in value it will tend to drift around in the noise as other parameters dominate.

A Simple Case: Pre-Flop Hand Strength

In the poker AI article I stated that the basis for the pre-flop decision can be encapsulated in a simple parametric equation which produces a decision score (DS) for any given two hole cards. But how do we choose the values of constants within that equation? Initial estimates of the values will emerge when developing the form of the equation, but to finalize the values, an optimization technique can be used.

A number of key hands are chosen and desired target DS values assigned. The optimization then seeks to minimize the sum of the squared errors between the target values and the scores calculated from the parametric equation given the current set of constants. This is called "least squares minimization" and is a common numerical formulation.

There are a number of specialized higher order algorithms to solve it. However MCT is simple, and there is a further reason to avoid higher order algorithms: our pre-flop parametric equation contains a lot of step changes in the scoring -- due to if or case statements.

For example if a non-pair is suited we might be adding a bonus of 100 to the hand score. Thus the parametric function is not smooth and continuous (see figure 1 in the other article). Discontinuities cannot be differentiated and therefore high order methods are not suited to this type of problem.

The MCT algorithm is relatively simple. Start with a first estimate of the parametric constants, and then randomly update a subset of constants (say 1-3 of 10) by a small random change (say -10 to +10 points compared to the DS maximum of 1000).

Evaluate the scores for the set of critical hands for this candidate solution and hence the total square error. If this is lower than the best error total, we keep the new set of constants; otherwise we revert to the previous set with the best current error. Continue this cycle until the error is below an acceptable value.

It is also worth having an emergency exit clause -- either a fixed number of total cycles, or if no progress has been made in a number of cycles then stop. This is because the minimum error will almost certainly be non-zero and above the ideal precision so the primary convergence clause may not achievable. This tends to happen where target values of some hands lead to opposing requirements on the input constants and so compromises occur.

Of course it is possible to hone this technique. For example where we find a candidate modification that is beneficial we can repeat it, or a fraction of it, until we get no further reduction in the total square error. However, since the parametric equation is very simple and fast to evaluate it is probably not worth the development time of honing the algorithm significantly, but in more complex problems this and other improvements could worthwhile.

The choice of the critical hand subset used to generate the total square error needs thought. It should be representative, but fully bound the range of the parametric function (which is true for any numeric problem). So to represent the pairs component of the score for example the hands 22, 77, JJ and AA might be a good set.

I also found it was good to have comparison hands in the critical set which differ only slightly but represent one a particular parameter, for example AK suited (AKs) and AK unsuited. There is actually no harm in using quite a large number of critical hands as long as the parametric function can be evaluated quickly.

In terms of the poker application, we are using a number of playing style characterizations. This means the code contains more than one set of pre-flop constants, each optimized individually for that style. This allows characters to shape their preferences -- some will play only pairs and avoid inside runs for example. Choosing the critical hands is therefore important in achieving a good match to the desired character profile. Changing the target hand scores, bunching the hand types in areas of interest and weighting the hand score errors can all help to solidify a playing style.

For example, to favor a player who tends to play pairs over any other hand type we might choose all 13 pairs with higher target scores in the range 700-900 where 700 is the boundary for playing the hand. We cannot ignore the other hand types as that will allow their associated parameters to drift unstably, but by only choosing a few such hands at lower target scores (below 700 so that they are unlikely to play) the pairs will dominate, so we might chose AK, AKs, 3K, 34s, 78 and 79s only in addition to the pairs.

For the most able AI in fact I used all 168 unique hands with target scores which generated the pattern of play broadly recommended by the Poker professionals. But using MCT allowed me to quickly and easily experiment with alternative characteristic strategies. On a half decent PC each optimization typically takes only a few minutes.

Optimizing The AI Characters

The technique used in the previous section for a specific part of the Poker AI can be used to optimize the overall play of individual AI characters. This can be done by placing numerical parameters throughout the core AI analysis code, in the modules for Flop, River and Turn analysis, and the conversion of hand score into betting. It is best to group these parameters, so that the optimization can focus on certain areas individually.

In my mobile application I used a small set of characteristic parameters such as "tightness", "aggression", "skill" etc. to describe each character which are used in numerous parts of the code, and then a single set of detailed parameters per analysis module used in only local parts of the code. The entire balancing process then requires an optimization of each module's local set and then the character trait sets with the objective of defining each "style of play".

As poker is a randomized game any optimization must include a "representative" session of play. We could script a set of specific hands and scenarios to define this session but this is both time consuming and open to error if the scenarios are incomplete.

By choosing a scenario script that deals with only a small part of the problem space we can easily produce results with hidden faults. As the code to be optimized becomes more complex it is much more difficult to be sure the critical sample points are indeed representative if they are chosen manually.

Therefore the approach I used was to employ randomization again to search the problem space. So we play say 1000 random games, and take samples of average performance.

Note that this means if we ran two sessions with the same input parameters we would not get the same result. We can help matters by using a known set of seeds for the card shuffle, but a Poker game is in mathematical terms classically chaotic -- a small change in the decision of one of the players at the table can drastically alter the outcome of the game.

However in my experience fixed shuffle seeds do help to some extent. Of course using a larger number of games per session reduces statistical variations between successive sessions. If you are worried, measure the inconsistency, and be aware that if the optimizer is making improvements only of the order of the session to session inconsistency, it is working 'in the noise' and will probably not be making useful progress.

Defining the correct numeric value from the metrics drawn from the playing session is very important. If the objective is to find the best AI player, then simply maximize the average dollar win rate per game over the session. But to find the best parameters for a 'tight' player we might try and maximize the average win rate divided by the average amount invested per game by the player. A good tight player will want to win well, but while not spending too much.

It is also possible to optimize to create a mediocre or poor player. For a weak AI character we cannot simply minimize the win rate -- that will produce some fool who throws his money away. Instead minimize the least square error between the measured win rate and a target value. For instance if our best professional has a win rate of $10 per game, a target value might be a win rate of $-15 per game i.e. a defined but gradual loss -- the character is bad, but not hopeless.

To cement particular characteristics we can think in terms of setting "boundary conditions". This is where, in a general optimization problem we set rules or conditions which affect how the parameters are allowed vary. This can be simple capping to keep parameters sensible -- for example I tended to apply limits to the range of parameters in the pre-flop equation to ensure the evaluated score is within the 0-1000 desired range.

But in terms of character balancing a good technique is to pin one or more of the parameters in the set. For example to produce the very best tight player we might pin the "tight" character trait at 3 (of a range of 0-20) and then optimize for maximum win rate allowing all other parameters to vary freely. The beauty of MCT is that the optimization process will teach the character to compensate for his fixed traits without any need to change the solution algorithm.

The number and type of AI opponents to play against the character being 'trained' is also important. A good Poker player knows that on a table of tight players, you should adapt and play a little more loose -- so to train a loose AI character play him against seven other AI who are mostly tight. Similarly in tournament play you need to adjust to the fact there are fewer opponents in the closing stages, so train a tournament player against only two other AI of different types. Repeated training against different opponent groups can also be considered, but be aware that the player will always be most influenced by the last training session completed.

So to summaries: the balancing process was a number of optimization sessions, each with a customized minimization/maximization metric averaged over a large number of games against a table of selected AI opponents, where a subset of the AI parameters are varied using a Monte Carlo approach, and with customized boundary conditions to steer the solution towards the desired style. If development time is short, composite characters can be used.

For example for a weak player simply take the best AI and turn down the 'skill' trait. Or you could mix the hand analysis parameter subset of the tight player with the betting subset of an aggressive player to produce an interesting variant. But always test such composites with a single 1000 game session -- sometimes a mix of two characters can produce a muddy individual with no recognizable style --we want distinguishable AI!

Hand Win Probabilities

A separate application of the Monte Carlo technique is in the calculation of hand win probabilities as part of the AI analysis from the Flop onwards. Unlike the offline balancing and training applications above this is done in live play and therefore we have to consider processing load with care. It is possible to calculate some probabilities analytically.

For example if you have AJ in the hole and the flop is A74 the probability of drawing another Ace for Trips is 2/47 + 2/46 or 8.6%. This is simply 2 possible aces in 47 remaining unknown cards on the Turn, and if that fails 2 in 46 cards on the River.

But more complex situations take more thinking about. There are specialized books available which describe various probability calculations in great detail -- but these would take quite some time to encode.

What is more important though is that this isolated calculation is not actually your overall win probability -- it is just one possible outcome of many. For example an opponent could also have an ace in the hole and would match your trips, but could beat you on a higher kicker.

Another player might be holding 77 and already have three of a kind and the current winning hand. If there are three players remaining rather than one, there is more chance that one of them could have a hand that can, or might in the future beat yours.

The AI code can substantially reduce the need for this multitude of considerations by running a quick Monte Carlo simulation -- effectively ask a number of "what if these cards were dealt" questions.

So the algorithm is to randomly assign hole cards to all remaining players and any un-dealt community cards, remembering that assigned cards cannot be any already dealt and known to the AI making the evaluation, and ask the engine to determine the winner of the hand. Repeat this a number of times and use the number of wins to find an average win probability. This will automatically account for the effect of the number of opponents still in play.

Clearly the number of samples in the simulation is important in making the probability accurate -- more samples means less statistical variation. Perhaps surprisingly though, 50-100 samples are often enough, and this is manageable in terms of overall AI time even on very low end mobile handsets. There will always be residual variation in the result, but this does not matter.

A human player will never make absolutely consistent decisions. Indeed the vast majority of players will only be able to approximate the win probability in real life in any case, either by running through a simplified calculation in the head, or by knowing from experience that certain scenarios lead to certain win probabilities. Thus the number of samples used is an AI trait in itself -- weaker players who are thinking only superficially use fewer samples.

Also note that a re-calculation of the win probability is not always necessary. If an AI bets on the flop on the strength of a good win rate, but another player raises, then first AI need not re-calculate from scratch if the number of players has not changed -- but it could run another 50 samples to refine the previous result instead. Thus I always cache the most recent calculation for each player.

This method can also be deepened into the onion layers of the poker game. Firstly individual player styles or preferences may favor some hands and therefore artificially improve or deteriorate the calculated values.

One notable mistake amongst beginners is to hold on to an unpaired Ace in the hole even though it gains no real support from the flop. So we could apply a post evaluation modification that drags the win probability result towards 100% for our Ace obsessive character.

What the AI does with the win odds is also open to variation -- some AI might see a high win probability as reason to bet in itself, other better AI might go on to compare to the pot odds (the rate of return per investment) before committing their money.

At deeper layers still the Monte Carlo simulation itself can be "steered" in one direction or another. In the flop any player that bet strongly pre-flop can be assumed to be holding a good hole hand, so we might want to weight their assigned cards to the higher end of the spectrum.

This is achieved easily -- assign the player's hole cards, evaluate the pre-flop parametric decision score and if this is low re-assign the hole cards; allow a maximum fixed number of re-assignments before committing. But on the other hand habitually loose players come out of pre-flop with virtually any hand, so if this opponent is still in play it makes sense to suppress any hole card upgrading.

Conclusion

I have discussed the use of Monte Carlo techniques in various applications within a Poker AI system. This is not the only type of game where I have used numerical optimization in some form; I have also used MCT and Genetic Algorithms in racing games, other card games, snooker and pool games. In general the same broad rules apply -- good design of both the code being sampled and the optimization code itself is vital.

In general automated training is not an absolute solution, human testing and common sense is still required and there are sometimes subtle disparities in the perception of the player and cold hard measurements. But I have found learning by optimization a very useful tool especially where testing/balancing resources are limited, and powerful computers are readily available.

Return to the full version of this article

Copyright © UBM Tech, All rights reserved