Introduction

Every year, there are upsets and wild outcomes during the NCAA men’s basketball tournament. But this year felt, well, wilder. For example, for the first time in 136 games over the 34 years of the tournament’s current 64-team format, a #16 seed (UMBC) beat a #1 seed (Virginia) in the first round. (I refuse to acknowledge the abomination of the 4 “play in” games in the zero-th round.) And I am a Kansas State fan, who watched my #9 seed Wildcats beat Kentucky, a team that went to the Final Four in 4 of the last 8 years.

So I wondered whether this was indeed the “wildest” tournament ever… and it turns out that it was, by several reasonable metrics.

Modeling game probabilities

To compare the tournaments in different years, we assume that the probability of outcome of any particular game depends only on the seeds of the opposing teams. Schwertman et. al. (see reference below) suggest a reasonable model of the form

where is some measure of the “strength” of seed (ranging from 1 to 16), and the scale factor calibrates the range of resulting probabilities, selected here so that the most extreme value matches the current maximum likelihood estimate based on the 136 observations over the past 34 years.

One simple strength function is the linear , although this would suggest, for example, that #1 vs. #5 and #11 vs. #15 are essentially identical match-ups. A better fit is

where is the quantile of the normal distribution, and is the number of teams in all of Division I. The idea is that team strength is normally distributed, and the tournament invites the 64 teams in the upper tail of the distribution, as shown in the figure below.

Probability of a perfect bracket

Armed with these candidate models, I looked at all of the tournaments since 1985, the first year of the current 64-team format. I have provided summary data sets before (a search of this blog for “NCAA” will yield several posts on this subject), but this analysis required more raw data, all of which is now available at the usual location here, as well as on GitHub.

For each year of the tournament, we can ask what is the probability of picking a perfect bracket in that year, correctly identifying the winners of all 63 games? Actually, there are three reasonable variants of this question:

If we flip a coin to pick each game, what is the probability of picking every game correctly? If we pick a “chalk” bracket, always picking the favored higher-seeded (i.e., lower-numbered) team to win each game, what is the probability of picking every game correctly? If we managed to pick the perfect bracket for a given year, what is the prior probability of that particular outcome?

The answer to the first question is the 1 in , or the “1 in 9.2 quintillion” that appears in popular press. And this is always exactly correct, no matter how individual teams actually match up in any given year, as long as we are flipping a coin to guess the outcome of each game. But this isn’t very realistic, since seed match-ups do matter; a #1 seed will beat a #16 seed… well, almost all of the time.

So the second question is more interesting, but also more complicated, since it does depend on our model of how different seeds match up… but it doesn’t depend on which year of the tournament we’re talking about, at least as long as we always use the same model. Using the strength models described above, a chalk bracket has a probability of around 1 in 100 billion of being correct (1 in 186 billion for the linear strength model, or 1 in 90 billion for the normal strength model).

The third question is the motivation for this post: the probability of a given year’s actual outcome will generally lie somewhere between the other two “extremes.” How has this probability varied over the years, and was 2018 really an outlier? The results are shown in the figure below.

The constant black line at the bottom is the 1 in 9.2 quintillion coin flip. The constant red and blue lines at the top are the probabilities of a chalk bracket, assuming the linear or normal strength models, respectively.

And in between are the actual outcomes of each tournament. (Aside: I tried a bar chart for this, but I think the line plot more clearly shows the comparison of the two models, as well as both the maximum and minimum behavior that we’re interested in here.) This year’s 2018 tournament was indeed the most unlikely, so to speak, although it has close competition, all in this decade. At the other extreme, 2007 was the most likely bracket.

Reference:

Schwertman, N., McCready, T., and Howard, L., Probability Models for the NCAA Regional Basketball Tournaments, The American Statistician, 45(1) February 1991, p. 35-38 [JSTOR]