As a fan of both data and soccer - from here on out referred to as football - I find the football fan's attitude toward statistics and data analysis perplexing, although understandable due to years and years of simple stats being the only thing that the media focuses on. Football is a complex team sport with deep interactions, therefore counting events (goals, assists, tackles, etc.) isn't enough.

This notebook shows how you can use play-by-play data to analyse a football match, showing custom measures and visualizations to better understand the sport.

Disclaimer: I'm a fan, not an expert. Germany's National Team and Manchester City have whole teams dedicated to data analysis, and the state of the art is quite above what is being shown here. However, rarely does that analysis is made public, so I hope this is useful (or at least entertaining). I hope to keep playing with the data and share useful insights in the future. Feel free to star the GitHub repository, or drop me at email at [email protected]

A note on the data used¶

This play-by-play data was gathered from a public website, and I have no guarantee that it is consistent or correct. The process used to gather will be the theme of its own post, so stay tuned. On the other hand, all calculations based on the raw data are included in this notebook, and should be questioned. I would love to get some feedback.