When is it appropriate to call a basketball game over? The answer is never… or not until the final buzzer. It is a commonly accepted trope that basketball games are most competitive during the final minutes. But, wouldn’t it be impressive if, at just a minute or so into the game, you were able to tell your friends who the winner will be? (Some say impressive, others say ‘ruining the fun’)

Introduction

So how about it, when can we turn off the TV knowing who the winner will be? For this, I analyzed the play-by-play of each March Madness game through the round of 32 in the 2019 tournament. I limited my scope to six key events in the game. These points were the first score of the game, two minutes into the game, five minutes in, ten minutes in, halftime, and the final buzzer of the game. At each event in each game for each team, I looked at one thing, did that team hold the lead?

Let me clarify what I mean by this. My analysis aimed to reward a team for winning at a certain event. The first and final events are simple: only one team is winning at the first score (the team that scored) and only one team is winning at the end of the game (the winner). At every other point there exists a possibility that there is a tie meaning that neither team is winning. Therefore in a tie, neither team is rewarded for winning.

Each model created works on a binary system. Either the team has the lead or they don’t. So, a binomial distribution was used to simplify the analysis. A ‘1’ was given to a team who held the lead, whereas a ‘0’ was assigned otherwise (tie or deficit). What does this do in terms of extrapolation? Well, with this binomial heuristic, the amount by which a team is winning or losing at any given time is not necessarily considered, or rather explicitly considered. However, it can be argued that if a team has a large deficit a five minutes, they are likely to also have a deficit at ten minutes. But, this is out of the scope of this analysis.

Analyzing the Data

As I mentioned before, I looked through each individual game’s play-by-play history to determine who, if anyone, had the lead at a certain event. Sometimes this was simple as a score occurred at exactly five minutes or there was a ten-minute T.V. timeout. This is the reason that the model was limited to six events. A model where every event in the play-by-play could be considered, but by the current methods, the six events were chosen.

Choosing the events came down to this question: which events span being reasonable to make a prediction and impressive. Reasonable means that a later prediction will be more relevant to the final outcome of the game (which may not always be the case but it is an operating assumption). Likewise, an impressive prediction would be made earlier in the game. Thus, a halftime prediction may be more likely, but a first-score prediction would be much more impressive.

Now is where we begin to make hypotheses. A null hypothesis in this situation would be that no event carries any statistical significance in being able to predict the final winner. The hypothesis in this study was that there is event that, with some statistical significance, is helpful to predict the final winner. One assumption are that any difference in score is covered by the model without being explicitly coded, as I mentioned before. Another assumption is that any other affects of a typical basketball game (fatigue, injuries, coaching styles, etc.) are also covered in the umbrella but are not explicitly being modeled. Therefore, while this model may show an ability to predict a winner, nothing else can be said about either team or the game itself.

Now that we have all this taken care of, let’s get started.