Introduction

During the 2015 NCAA men’s basketball tournament, I won our office pool by (1) picking then-undefeated Kentucky to lose– although earlier than their actual Final Four loss to Wisconsin– and (2) picking Duke to win the championship game. It was a come-from-behind victory for my bracket, moving from 14th place to 7th to 1st… over the span of the last three games in the 63-game tournament.

But should I have won? Our pool used the common bracket scoring system of assigning:

1 point for each correct pick in the first round of 64 teams,

2 points for each correct pick in the second round of 32 teams,

4 points for each correct pick in the third round of 16 teams,

8 points for each correct pick in the fourth round of 8 teams,

16 points for each correct pick in the two Final Four games,

32 points for correctly picking the champion.

This “doubling” system has several reasonable mathematical motivations. For example, each round of games is potentially worth the same number of points (32). Also, assuming all of the teams are evenly matched– or equivalently, assuming that you make all of your picks by flipping a fair coin– then the expected number of points scored decreases by exactly half with each round.

But teams aren’t evenly matched, and you don’t make your picks by flipping coins. Intuitively, then, it seems like this doubling system might be over-weighting the importance of later rounds, and perhaps a better system involves less extreme increases in points per game from one round to the next. One of the more amusing common suggestions is a progression based on the Fibonacci sequence, with games in each round worth 2, 3, 5, 8, 13, and 21 points, respectively. My goal in this post is to describe a means of more accurately evaluating and comparing these and other bracket scoring systems.

Probability model for tournament games

First, we need a way to model the probability of correctly picking any particular game. A reasonably simple starting point is to assume that all games are independent, with each outcome’s probability depending only on the teams’ seeds. More precisely, let P be a 16×16 matrix with entries

indicating the probability that seed i beats seed j, where is some measure of “strength” of seed i (decreasing in i), and k is a scaling factor that effectively determines the range of resulting probabilities. For example, if , then every game is a coin flip; at the other extreme, if , then a 16th seed has zero probability of a first-round upset against a 1st seed. For this discussion, k will be chosen so that

based on the observation that, in 124 match-ups over the last 31 years of the current tournament format, a 1st seed has so far never lost to a 16th seed. This probability is the expected value of the corresponding beta distribution.

I used a simple version of this model a year ago to estimate the probability of picking a “perfect bracket,” that is, picking all 63 games correctly, using a linear strength function:

so that depends only on the difference between seeds. Even this very simple model isn’t too bad, as shown in the following updated figure, with the linear prediction model in red, and the last 31 years of historical data shown in blue, with corresponding 95% confidence intervals in black. As the often very wide confidence intervals suggest, 31 years is still not much data; for example, there have been only 7 match-ups between seeds differing by 10: 1st vs. 11th are split 3-3, and a single 2nd seed won over a 12th.

As usual, it turns out that this was not a new idea; Schwertman et. al. (see References at the end of this post) considered this same model back in 1991, as well as another non-linear strength function that turns out to be a better historical fit:

where is the quantile function of the normal distribution, and is the total number of Division I men’s basketball teams. The idea is that the “strengths” of all teams are normally distributed, with the 64 teams in the tournament comprising the “strongest” teams in the upper tail of this distribution. I will use this strength function for the remainder of this discussion.

Computing probabilities of correct picks

Given whatever matrix P of probabilities we choose, we can use it to compute the resulting distribution of the seed winning any particular game in the tournament. If and are 16-element column vectors with ( ) indicating the probability that the home (visiting) team in a particular game is seeded i, then the distribution of the seed winning that game is given by

where is the element-wise Hadamard product. In the first round, each and is a basis vector. Note that including both terms in the summation is really just a computational convenience, at least within a region, since for a given seed, only one of the two terms’ corresponding components will be non-zero.

By applying this formula iteratively for each game in each successive round, we can eventually compute the probability of each seed winning each game in the tournament. For example, the following Python code computes the distribution of the winner of any one of the four regional championships (among 16 teams each):

import numpy as np def bracket_seeds(num_teams): """Seed given number of teams in single-elimination tournament.""" seeds = np.array([1]) while len(seeds) < num_teams: seeds = np.array([seeds, 2 * len(seeds) + 1 - seeds]).transpose().flatten() return seeds def dist_game(dist_a, dist_b, P): """Compute distribution of winner of a vs. b with probability model P.""" return dist_a * (P.dot(dist_b)) + dist_b * (P.dot(dist_a)) def dist_region(P): """Compute distribution of regional champion with probability model P.""" games = np.eye(16)[bracket_seeds(16) - 1] for round in range(4): games = [dist_game(games[k], games[k + 1], P) for k in range(0, len(games), 2)] return games

The resulting predicted probabilities are shown in the following figure in red– using the normal quantile strength function above– compared with the actual frequencies in blue.

Bracket scoring systems

Now that we have a means of computing the probability of any particular team winning any particular game, we can evaluate a completed bracket by computing the expected number of correct picks in each round. For example, suppose that our bracket simply picks the favorite (i.e., the higher seed) to win every game. Then the expected number of correct picks will be:

23.156 of 32 games in the first round,

9.847 of 16 games in the second round,

4.292 of 8 games in the third round,

1.792 of 4 games in the fourth round regional championships,

0.540 of 2 games in the Final Four,

0.156 of the final championship game.

At this point, we can compare various bracket scoring systems by comparing the expected number of points scored in each round using those systems. For example, the following table shows the expected points per round for the two systems mentioned so far: the doubling system (1, 2, 4, 8, 16, 32) and the Fibonacci system (2, 3, 5, 8, 13, 21), normalized to 1 point per first-round game.

Round Doubling Fibonacci/2 1 23.156 23.156 2 19.694 14.771 3 17.168 10.730 4 14.334 7.167 Final Four 8.636 3.508 Championship 4.978 1.633

Which of these or any other systems is “best” depends on what kind of pool you want. With the doubling system (or even greater progressions), you can have an “exciting,” horse race-ish pool, with lead changes and multiple entries having a chance of winning throughout all six rounds. With the Fibonacci system (or even more gradual progressions), you can have a pool that rewards research and accurate prediction of early-round upsets… but such a pool may be effectively over well before the Final Four.

References:

Schwertman, N., McCready, T., and Howard, L., Probability Models for the NCAA Regional Basketball Tournaments, The American Statistician, 45(1) February 1991, p. 35-38 [PDF]

Appendix: Historical data

The following matrices contain the record of all wins and losses, by round and seed match-up, for the 31 tournaments in the current format from 1985 through 2015. First, the following 16×16 matrix indicates the number of regional games– that is, in the first through fourth rounds– in which seed i beat seed j. Note that the round in which each game was played is also implicitly determined by the seed matchup (e.g., 1 vs. 16 is in the first round, etc.).

0 21 13 32 30 6 4 51 56 4 3 19 4 0 0 124 21 0 23 2 0 23 53 2 0 26 12 1 0 0 117 0 8 14 0 2 2 38 7 1 1 9 25 0 0 104 1 0 15 4 3 0 36 2 2 3 2 2 0 21 99 0 0 0 7 3 1 30 0 1 0 0 1 1 0 80 11 0 0 0 2 6 28 1 0 0 3 0 0 4 81 0 0 13 0 0 0 20 5 2 0 3 0 0 0 76 0 0 0 1 2 0 12 3 0 5 2 1 1 0 63 0 0 0 1 0 0 0 5 1 0 0 1 0 0 61 0 0 0 0 1 0 0 0 0 18 4 0 0 2 48 0 0 0 0 0 0 1 4 0 3 1 13 0 0 43 3 0 0 2 0 0 0 5 0 0 0 0 0 12 44 0 0 1 0 0 0 0 8 0 0 0 0 0 0 25 3 0 0 0 0 0 0 3 0 0 0 0 0 0 20 0 0 2 0 0 0 0 0 0 0 0 0 0 0 7 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

The following matrix, in the same format, is for (fifth round) Final Four games:

12 6 2 5 1 0 1 1 1 0 0 0 0 0 0 0 4 2 3 1 0 1 0 0 0 0 1 0 0 0 0 0 4 2 0 2 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

And finally for championship games:

6 6 1 2 3 1 0 0 0 0 0 0 0 0 0 0 1 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0