By Andrew Puopolo

Yesterday marked the final day of matches in the Premier League season, and while watching Tottenham’s wild 5-4 victory over Leicester, I thought to myself: since most teams often have very little to play for, is there less of an effort in defending and are more goals scored as a result? Last season, Tottenham defeated Hull City 7-1 in a similarly wild match and the year before were defeated 5-1 by Newcastle at St. James’ Park. In addition, we have seen Stoke City defeat Liverpool 6-1 in Steven Gerrard’s final match, and West Bromwich Albion come from 5-2 down to draw Manchester United 5-5 in Sir Alex Ferguson’s farewell to management. Suffice to say, these results would be incredibly shocking if they were in January when everyone has everything to play for. I decided to address this question.

To answer this question, I decided to attempt two different methodologies. The first method I tested was simulation by random sampling. To start, I calculated the total goals in the final match in every Premier League season from 1996/97 to 2016/17 (for a total of 210 matches). We found that in those 210 matches, there were a total of 629 goals scored, for an average of 2.99 per match. Then, we tried to generate a null distribution for how many goals should have been scored in those final matches. We did this as follows:

1.) Randomly pick 10 matches in each of the last 21 seasons from the first 37 matchdays. This takes into account a difference in scoring over the years. By doing this, we end up with a total of 210 random matches.

2.) Calculate the average number of goals per match in this 210 match sample

3.) Repeat steps 1 and 2 100,000 times.

After doing this, we were able to plot the following histogram for the average goals per match in each of our samples, with a vertical red line for our test statistic (the 2.99 goals per match from the real data).

We see that most of the simulations are centered between 2.5 and 2.6, and we find that only 119 of the original 100,000 simulations finished with over 2.9 goals per match. This indicates that there is significant reason to conclude that the final matchday generates more goals than the rest of the season.

However, we wanted to take into the specific matchups on the final day. It is often the case that the top teams do not play each other on the final day of the season and feisty derby matches are avoided as well. As a result, we decided to do a parametric t test comparing the number of goals scored in a given final day matchup with the number of goals scored in the reverse fixture earlier in the season. For example, we can look at this year’s final day fixtures.

It actually turns out that this season there were fewer goals on the final day (average of 3.1, despite the 9 goal thriller at Wembley) than there was in the corresponding reverse fixtures (3.7).

We compiled these statistics for each of the previous 21 seasons, and conducted a two sample t test. Our results were as follows:

This shows that on average, there are an extra .2 goals per match scored on the final day than in the corresponding reverse fixtures. However, given the small sample size of 210, this is not statistically significant at the five percent level.

The two sample t test has as one of its assumptions that the data is drawn from a normal distribution. The number of goals scored in a soccer match, are not normally distributed, but instead drawn from a Poisson distribution. As a result, I decided to run a similar test to see if the two Poisson distributions have the same rate parameter.

We get similar results to the above parametric test, which is that there is not enough evidence to conclude that the two distributions are drawn from different Poisson random variables.

This leads us to an interesting quandary, do we believe the results of our simulation or do we believe the results of our two parametric tests? At this point it’s tough to conclude one way or another if there are more goals on the final day of the season, but it’s an interesting question to think about, and what is the correct way of determining this empirically.

Feel free to leave comments below on your thoughts about this. If you have any questions for Andrew, please feel free to reach out to him on Twitter @andrew_puopolo or via email at andrewpuopolo@college.harvard.edu.

Share this: Twitter

Facebook

Reddit

LinkedIn

Google



Like this: Like Loading...