How Fair Is My D20?

An automatic system for rolling a polyhedral die and taking photos of the rolls; extracting the image of just the die from those images; clustering the images of the die by which face is shown; and analyzing the results.

I was inspired in part by the Awesome Dice Blog's 2012 post comparing d20 fairness between two manufacturers. (Christopher Galpin in 2014 links to a number of other interesting analyses; John Kern in 2006 does Bayesian analysis for Pass the Pigs.) They rolled and tallied by hand.

Skip to:

See the Github repository for the scripts used to process image data from test runs, and data for each die tested. The source data for these charts is shared on Google Drive, including full image data for one die.

Results

Overview

Smaller standard deviation is better, and expected values closer to 10½ are better (meaning "fairer" in both cases).

Expected value is the average roll; for an ideal, fair d20 this is 10½, so the graph shows most of the tested d20s are slightly unlucky. However this graph doesn't tell you about specific roll outcomes (like 20s or 1s).

The standard deviation is on the normalized frequencies of the rolls for a die; thus a d20 where all sides have ideal (1.0) frequency has a standard deviation of 0.0, whereas dice with more variation have higher standard deviations.

Trends within brands are discussed below, but it appears that across brands d20s (with most showing a standard deviation above 0.10) are less fair than dice with fewer sides.

How Many Rolls?

100-150 rolls per die face.

The below data is from an opaque purple Wiz Dice d20, rolled 8000 times.

The different datasets are from the first N rolls of the actual rolled sequence. So the series labeled "100" is from taking the first 100 rolls from the set of 8000, and charting the relative normalized frequency of each side within that subsample; similarly for the other sample counts.

"Normalized Frequency" is used so Y values on different histograms / for different dice can be compared easily. For example, an ideal fair d20 rolled 3000 times would show each face 150 times. So if a 20 actually comes up 102 times, the normalized frequency is 0.68. That is, in the example 20s showed up 68% of the times you would expect from a fair die. And frequencies of 1.0 correspond to what would be expected of a fair die.

The relative side frequencies for 1000 rolls or (especially) 100 rolls are fairly different from the frequencies at 8000 rolls. But anything from 2000 up looks fairly close. Thus I conclude that 2000 rolls is probably sufficient, and for good measure 3000 rolls should give a good picture of the die's behavior; this implies 100-150 rolls per side.

For a Crystal Caste clear black d8, rolled 3000 times:

In this case, with 512 samples the frequencies look somewhat different from those at 3000 samples, but with 1024 it's fairly close.

This leads me to a heuristic of 100 rolls per side (800 for a d8, 2000 for a d20). It is of course a subjective heuristic.

Chessex

The Chessex dice I tested: I bought a 7-die Copper-Steel Gemini set and a bag of 6 d20s (from which I tested a red, yellow and purple/gray die), and borrowed another d20 (green, probably 1990s purchase date).

The Chessex d20s are all mid-range fairness. The older borrowed green die (probably 1990s) proving fairer. (The rest were all purchased in 2015.) They seem to roughly favor and avoid the same numbers which, on the dice, are arranged like the two sewn halves of a baseball.

On these dice (as with most d20s), physically opposite faces sum to 21 (1 and 20, 2 and 19, etc). The distribution of frequencies is fairly symmetric around 10½, possibly as a result, which prevents the expected value of the die from deviating too far from 10½.

Wiz Dice

The Wiz dice I tested: Dice from two sets out of a highcitybooks.com 35-die set; and three borrowed dice. (Note that highcitybooks is as of 2015 November selling Chessex, but historically sells bulk dice from multiple brands.)

In this small sample, the translucent dice were less fair than the opaques. These Wiz Dice were highly variable, but do tend to have symmetrical distributions around 10½ (like Chessex). The pattern is more easily visible with one of the above dice on its own:

Game Science

The Game Science dice I tested: two individual d20s; one black with gold numerals, and one white with black. (Just for fun, I've put the white one on eBay.) I tested the black d20 both before and after trimming a molding bump (left and right images above, respectively; the trimmed bump is at the right of the edge between 7 and 4). Game Science promotes its dice as especially fair.

The dice require trimming after they arrive. The two dice did both arrive with visible bumps. I tested the black die before and after trimming it, and saw a significant improvement in fairness, in fact leaving the black Game Science d20 as the fairest d20 tested (with a standard deviation of 0.07, up from 0.12, and a Chi Squared value of 0.76). But the black die's expected value was 10.38 before trimming and 10.37 afterwards (compared to 10.5 for an ideal fair d20). However the white d20 after trimming was only run-of-the-mill (standard deviation of 0.12).

Unlike other dice tested (including the white d20), the black d20 very quickly became marred (possibly from hitting the LEDs mounted inside the rolling machine).

The white d20's expected value of 10.67 makes it one of the luckiest dice tested.

Crystal Caste

The Crystal Caste dice I tested: hybrid translucent orange and black/white translucent.

Crystal Caste's d20s were by far the least fair tested. The orange d20 (above) was visibly egg-shaped: longest diameter 19.77mm between 4 and 17, shortest diameter 18.98mm between 1 and 20. The two d20s do follow a similar distribution, and both have slightly low expected values, 10.18 and 10.24.

I also compared one "Crystal" die to its platonic solid counterpart.

Crystal Caste says the "Cyrstal Dice" are "A totally new shape for RPG polyhedral dice: geometric cyrstals with sides of exactly the same size, guaranteeing random numbers." However their "crystal" d6 compared very poorly to their own cube d6 from another set (standard deviations of 0.25 and 0.05 respectively). (Their standard-shaped d8 also performed fairly well with a standard deviation of 0.09.)

Koplow

The Koplow dice I tested: three dice from a set of 10 d20s.

Unlike the other d20s, the opposing sides on Koplow d20s do not all sum to 21. The pairs are: 1/20 2/12 3/17 4/16 5/19 6/14 7/13 8/18 9/15 10/11.

Pipped d6s

I tested some standard pipped d6s: The two from my Settlers of Catan set, and three from Koplow. The Settlers d6s turned out to be some of the fairest dice tested. (In the images above the dice have appear to have a green cast because I repainted the interior of the rolling bucket green, for better contrast with the white dice.)

In the game Settles of Catan, the two dice are rolled together, to get a number from 2 to 12. Even with slightly imperfect individual dice for rolling numbers from 1 to 6, the combined effect of rolling two dice results in clearly different frequencies in rolling for example a 7 (just over 16.6% of the time, which in fact is about the theoretical 1/6th) versus a 6 (about 13.7%).

Chessex and Wiz Dice d6s

The d6s with numerals were not appreciably different in fairness from pipped d6s.

Skew Dice

Dice Lab sent me some Skew Dice for evaluation. These are a novelty shape that is still mathematically fair. Since the dice are asymmetrical, there is a "clockwise" and "counter-clockwise" version of each shape. The dice turned out to be imperfect embeddings:

Geometric Analysis

Following the example of 1000d4.com, I measured the distances between opposite sides of several d20s using digital calipers. Below is a comparison of those measurements and observed rolls.

The Crystal Caste (above) and Koplow (below) d20s clearly have some correlation between diameters and observed frequencies.

As an example, take sides 7 and 14, which are opposite each other on the Crystal Caste translucent orange d20 (graph above). The mean diameter of the die (the average distance from the center of one face to the center of its opposite face) is 19.43mm. The distance between 7 and 14 is 19.20mm, 0.23mm shorter than average (rendered as positive on the graph for easy comparison with roll frequencies). And in this case the compressed dimension correlates with both of those sides coming up more often than average (1.39 and 1.13 respectively).

The Wiz Dice (above) and Game Science (below) d20s have some correlation between diameters and frequencies, but (especially for the fairer Game Science die) not to the extent that diameters are a reliable predictor of observed rolls.

Salt Water Float

Many people recommend floating a d20 in a salt water bath to check its balance (Daniel Fisher's video is referenced by many, including a writeup at Critical Dice). It does seem to uncover imbalances.

I tried floating a couple of the least fair dice, but neither showed a strong orientation preference. I could coax the dice to settle with any face up, and when spun, I did not observe the same side up (or axis vertical) consistently.

However, geometric analysis (above) on the same dice does correlate strongly with observed roll frequencies. So using calipers seems to be a better predictor of unfair die behavior than the saltwater float test.

As others have observed, I also found that different dice have very different densities (and of course this test can't be applied to metal dice).

Hardware Setup (Die Roller and Camera)

A microcontroller runs a servo motor to shake a small tub, and triggers a camera to take pictures. More details: Arduino sketch and hardware parts list.

Construction

The main container is an empty (and well washed) ice cream carton, chosen for its flat bottom and sloping sides. The servo motor's arm is taped to the side, and a paperclip makes the pivot on the opposite side. A simple U of cardboard forms the stand (with weights on it to keep it still, and small additional pieces of cardboard to keep the resting position consistent). The servo motor is mounted in a snugly fitting hole cut in the cardboard.

A small piece of translucent plastic (from a nametag holder) makes a ramp so the die can roll over the LEDs/wire. Plastic wrap with a rubber-band provides a cover. This prevents the die from rolling out when the carton tips down for each roll.

The LEDs are placed through slots in the carton, facing downward. This keeps them from shining back up at the camera. One power supply wire is inside the carton, and one outside (soldered in place). With the LEDs on and the room lights off, there is negligable glare.

A colored paper insert can fit in the bottom of the tub to provide contrast for white dice.

The camera is a Nikon D90, using a long (55-200mm) lens for low perspective distortion across the visual field. It is triggered via wired remote (though the GPS/remote port was defective and required repair); and powered via its AC adapter port (using a 3D-printed plug).

Timing: In several thousand rolls, only a few do not fully settle before the photograph is taken. With this timing, it captures about 790 rolls per hour (or one every 4½ seconds).

Repeatability: Despite quick construction, the servo and its taped attachment reliably returns the tub to the same position, closely enough for analysis.

Improvements

Noise: The servo motor's whine carries, as does the sound of the die hitting the inside of the paper tub. (This is true even when set up in a closet, as pictured above.) Heavier or less resonant materials might help, as could lining the container with something like felt.

Turbulence: The smaller dice (d4 and d6) slide down the tub's side when it tilts down, rather than rolling; this can lead to repeatedly rolling the same number. Bumpy sides on the container, or tilting further down, might help. I also tried rolling pigs from Pass the Pigs, but they did not roll sufficiently; more shaking would help with more grippy and irregular objects.

Strength: This setup is not tough enough for metal dice.

Roller Randomness

Performance

One concern in designing the rolling machine is that it wouldn't sufficiently tumble the dice, resulting in the same side getting rolled repeated, or one side always being followed by some other side predictably.

The below are sequence heatmaps, plots of which die side (horizontal axis) was followed by which other side (vertical axis), showing the number of times that two-roll sequence occurred (grayscale value and text). Data is from a Wiz Dice translucent blue d20 (stddev=0.2, moderate/bad) and a Chessex Gemini d6 (stddev=0.1, moderate for d6s tested).

If the same side were rolled many times in a row, there would be a hot line on the diagonal, but there isn't for either die. The d20 does show for example a dark row/column for 17 (which was rolled infrequently), but did not reveal clear biases. The github repository has sequence graphs for the other dice.

A similar concern was whether the machine was shaking the dice around enough. Below is an image with a light gray circle at each location where the die landed. (In this case, for the green Koplow d20.) It appears the die landed in many locations within the tub.

Software Explanation

There are two computer-vision tasks in this process: finding the die within the larger photo of the die-rolling area; and figuring out which picture is of which face of the die. The code described is on github.

Cropping (Finding the Die)

A photo of the rolled die is diffed against a reference image.

The result is scanned for areas of high difference to find the die. The die is flood-filled to find its area; the then image of the die is cropped out and saved.

The above image was obtained from running crop.py with --debug . The base (mostly black) image is the diff of the image with the die and the reference image. The blue lines are where the image was scanned, the red highlighted line segments are where a large difference was detected. The green dotted box was the detected bounds of the die.

Scanning is done on a scaled down version of the images, but the cropped image of the die is saved at full resolution for better feature detection.

Clustering (Which Images Are The Same?)

Images of the die are compared using features, detected and matched using OpenCV.

These screenshots from the find_obj.py OpenCV demo show good (above) and bad (below) matches. The white rectangles on the right side are the homography: the area of the right image that matches up with the left image. Good matches not only have a high number of matching points, but a simple (translation and rotation only) homography with low distortion.

The first step builds a list of dissimilar representative images, against which all other images get compared. Each representative gets a list of matching members (the images showing the same face of the die).

This tends to result in big groups for each of the sides of the die, and a bunch of small groups (1-10 each) of images that didn't match any of the representatives well. So a second step takes the small groups and compares them to members of the larger groups (not just their representative images) to find a match.

From a partial run on a d8, the work-in-progress with a number of small disconnected groups (above), and the consolidated groups (below).

License

These data, images, and code may be reused under the Creative Commons Attribution-NonCommercial 4.0 International License.