By Eliot McKinley (@etmckinley)

Machine learning is so hot right now and if Skynet is going to destroy all humans, it should at least know a little bit about Major League Soccer’s Columbus Crew. To wit, I created a machine learning model to classify which position in a Gregg Berhalter 4-2-3-1 formation a player most likely played in during a single game.

I chose the Crew for a couple reasons. First, they are my favorite team. Second, they had consistent coaching for a long period of time with a defined style of play. The latter is very important, as the model has to be trained well in order for the results to make sense. Since the Crew almost always played a 4-2-3-1 that relied on ball possession to disorganize the defense and create goal opportunities (get used to that phrase USMNT fans) it was a perfect test of whether this kind of thing could be done.

I won’t go too far into specifics, but the basic model used a Random Forest decision tree model that used the 2015-2017 seasons to predict player positions for games played in 2018. Player positions were defined to start the game (e.g. Harrison Afful = right full back, Federico Higuain = center attacking midfielder) and player actions were associated with each player during a game. These actions included passing types (based on K-means clustering, similar to this one), and the locations of defensive actions, aerial duels, and shots. The final output is a probability that a player occupied a specific position during the game (e.g. Gyasi Zardes had a 95% probability of striker, 3% left wing, 2% right wing).

Let’s look at some examples.