This installment of Probability in games focuses on the concept of variance as it relates to rolling lots of dice. Rather than looking at the probability of rolling specific combinations of dice (as we did in Probability in Games 02), this article is focused on the probability of rolling dice that add up to different sums. The inspiration for this topic comes from two different sources. The first was a statement by Geoff Engelstein that more dice can mean less luck in a game [during the Dice Tower podcast on Nov12th, 2013]. And the second was an assertion by James Ernest about the long lasting advantage of rolling well early in a game [during a GenCon2012 lecture]. Lets try to shine some light on both of these observations by digging into the concept of variance.

Having players add up the numbers rolled on several dice is a common mechanism in many games. Part of what makes this mechanism interesting is that different sums often have different probabilities of being rolled. For instance, you are about twice as likely to roll a sum of 7 as you are to roll a sum of 4 on two six sided dice. Yet, a sum of 7 is over four times more likely than a sum of 4 when rolling three six sided dice. This is in contrast to rolling a single die, where every side (and every possible sum) is equally likely.

When working with two or three dice, it’s not too hard to write an exhaustive table (or graph) for the probabilities of every sum. However, this becomes tedious for larger numbers of dice. Let’s look at some graphs for the distribution of sums when rolling multiple dice. The “n=” title for each graph tells you how many dice are being rolled. Below that, the height of each bar indicates the likelihood of rolling a specific sum with that many dice.

Notice that there are a few important changes to these graphs as the numbers of dice increase. First, the central and most frequent sum of each graph moves to the right (getting larger) as more dice are added. We’ll look at calculating the mean, which is the measurement of this central and most common sum. Second, the curve gets closer and closer to the common bell shape of a normal distribution. In fact, the Central Limit Theorem provides some insight into why the sum of a bunch of random dice must always approximate this normal distribution. This is part of what makes that bell curve so common. The width and steepness of this bell can be quantified with a measurement called variance that we’ll explore in more detail below. And third, the entire curve gets wider by five extra sum possibilities per die. The measurement of this space of possible sums is called the range, and the only reason that I mention it is to help distinguish it from variance. Let’s jump right into calculating the mean and variance when rolling several six sided dice.

The mean of each graph is the average of all possible sums. This average sum is also the most common sum (the mode), and the middle most sum (the median) in a normal distribution. In terms of looking at bell curves, the mean is how far left or right on the x-axis you’ll find the highest point of the curve. To calculate this mean for a single die, we can take the weighted average of every possible sum. However, the symmetry in a bell curve provides us with a nice shortcut of averaging only the smallest and largest possible sums.

Mean(1D6): (1 * 1/6) + (2 * 1/6) + (3 * 1/6) + (4 * 1/6) + (5 * 1/6) + (6 * 1/6) = 21/6 = 3.5 Mean(1D6): (1 + 6) / 2 = 7/2 = 3.5 Mean(2D6): (2 + 12) / 2 = 7 Mean(3D6): (3 + 18) / 2 = 10.5 Mean(nD6): (n + 6*n) / 2 = n * 7/2

Now that the mean is out of the way, we can discuss variance. Variance is a measure of how spread out the values in a distribution are. In our example, a low variance means the sums that we roll will usually be very close to one another. By contrast, the variance is large when the sums that we roll are frequently distant values. The way that we calculate variance is by taking the difference between every possible sum and the mean. Then we square all of these differences and take their weighted average. This gives us an interesting measurement of how similar or different we should expect the sums of our rolls to be.

Variance(1D6): (1 - 3.5)^2 * 1/6 + (2 - 3.5)^2 * 1/6 + (3 - 3.5)^2 * 1/6 + (4 - 3.5)^2 * 1/6 + (5 - 3.5)^2 * 1/6 + (6 - 3.5)^2 * 1/6 = 70/24 = 2.91

This was a bit more involved than calculating the mean. But fortunately, variances (like means) can simply be added up to account for extra dice (this is because each random die roll is an independent event).

Variance(2D6): 70/24 + 70/24 = 140/24 = 5.83 Variance(3D6): 70/24 + 70/24 + 70/24 = 210/24 = 8.75 Variance(nD6): n * 35/12

We now have a nice way of calculating the mean and variance for the sums of any number of six sided dice. The mean is easy to see in each graph, but the variance is a bit trickier to wrap our heads around. A more natural way to think about variance is to think about the percentage of rolls that share a small range of sums. Something like, most (68%) of rolls should sum to a value between 13 and 22. But there is one step we must take between finding the variance and relating it to a percentage like this, and that is to calculate something called the standard deviation.

Think of the standard deviation as another way of measuring variance; much like the way that distance can be measured in inches, meters, and light years. The standard deviation is easy to calculate once you know the variance, it’s just the square root of the variance. Another benefit of the standard deviation is that it is in units that we can visualize in relation to our graphs. Approximately 68% of our rolls will have sums that land within one standard deviation of the mean. And about 95% of our rolls will fall within two standard deviations of the mean. These magic percentages are common to all normal distributions. Here’s a graph that shows how these standard deviations relate to the chances of different sums. The greek letter mu is used here to label the mean, and the greek letter sigma represents the standard deviation.



CC BY 2.5 – http://commons.wikimedia.org/wiki/File:Standard_deviation_diagram.svg

We can now find the ranges of sums that will be most commonly rolled with any number of dice. Let’s go through the example of finding the range of sums that will account for 68% of all six die rolls. We start by calculating the mean, the variance, and the standard deviation for the sums of six dice.

Mean(6D6): 6 * 3.5 = 21 Variance(6D6): 6 * 35/12 = 17.5 StandardDeviation(6D6): SquareRoot(17.5) = 4.18

Because 68% of a normal distribution is always within one standard deviation of the mean, we now know that 68% of the time that we roll six dice, those dice will have a sum between 21 – 4.18 = 16.82, and 21 + 4.18 = 25.18. Obviously we can only roll sums that are whole numbers, so it’s 17 to 25. But remember that this is only an estimate, and that the distribution of sums for six dice are merely and approximation of the normal distribution.

It’s natural to look at this relationship between standard deviations and percentages, and wonder about the percentages that lie between each multiple of the standard deviation. For instance, you might want to calculate the percentage of rolls that sum to a value within 3 of the mean. This is a fairly complex calculation to perform by hand, but it is common enough to warrant look-up tables and calculator functions (like logarithms and various trigonometric functions). While preparing this article, I came across the following link to an online table / calculator of what are often called z-scores: http://davidmlane.com/hyperstat/z_table.html. Let’s try entering the mean and standard deviation that we just calculated for the sum of six dice into this webpage. Now, we can find out the percentage of rolls that will fall above, below, between, or outside of any particular sum(s). For instance, we can find the chance of rolling six dice to sum a value within 3 of the mean by entering “Between: 18 and 24”. The area (probability) field then populates with the value 0.5267, which tells us that 52.67% (or just over half) of our rolls should fall in this range.

If you think of this conversion from mean, standard deviation, and range to a percentage as a table look-up, you might correctly guess that you can perform the look-up in reverse. Instead of looking up a percentage based on a range of sums, you can just as easily look up a range based on a desired percentage. For instance, maybe you’d like to find a range of sums that account for 30% of the rolls of six dice. To do so, change the radio button at the top of the webpage linked above, from “Area from a Value” to “Value from an Area”. Then enter the mean and standard deviation for six dice, followed by the area (which we’ve been calling the probability) of 0.3 for 30%. The radio buttons at the bottom should now allow you to calculate the following six die 30% chance rolls.

Sum < 18.8 Sum > 23.1 19.3 < Sum < 22.6

Based on the length of this post, I believe this will be a good place to end. Hopefully you are now more comfortable calculating probabilities for rolling any range of sums on any number of dice. In order to reach this point, we've had to wrap our brains around the concept of variance, and acquire some experience working with normal distributions. To keep your brain going until next time, here's the problem posed by James Ernest (see the link above):



In a dice-driven horse race where each player will roll a 6-sided die 50 times, suppose the results after turn 1 are 1 versus 6. This early in the game, with 49 rolls to go, you would hope that the game is not already tilted heavily in one player's favor.



This takes a bit of clever-ness, but see if you can use this concept of variance to figure out each players' chance of winning in this game. Please post your solutions and how you got them below, along with any questions or requests you have for this or future installments of the blog. Thanks for reading!