Should Eddie Johnson start? Most fans think that the team plays best with Fabian Espindola and Luis Silva up top, and that EJ should start on the bench. I wanted to see if I could prove which strike tandem leads to the best shooting opportunities for the entire team using advanced statistics.

In this post we'll be looking at Expected Goals (xG) to attempt to understand how the team plays when we use different forward pairings. What are Expected Goals, you ask? To put it simply, an expected goal model takes into account various factors affecting a shot (location, speed of play, part of body used to strike the ball, etc) and comes up with a prediction for how likely that shot is to result in a goal. By analyzing a large quantity of shots, we can assign each shot a number between 0 and 1 indicating what percent of similar shots go in. It basically answers "if an average shooter were to take that shot, what percent of the time would they score?"

I used the model given here in my analysis, although there are more complex (and accurate) models such as the one found here (scroll to the bottom and expand the Methodology section to see it). I chose the model I used because of the constraints of the data I was using, as the only thing I had available was the distance from the end line that the shot was taken.

So, before we dive in, some notes:

The data is taken from squawka, and is maybe inaccurate because I had to translate a graphical representation of the shots to a distance value.

The model is derived from PL shooting data, and therefore may not be representative of MLS shooting. I want to create my own, but don't have enough data yet.

The model is, again, very simplistic. It does not take into account horizontal distance, or speed of play, or anything besides vertical distance from goal.

Other factors besides who the strikers are affect how we shoot. I know that, but I wanted to see if I could find a discernible pattern here.

Phew! Ok, so now that that's out of the way, here's what I actually did.



Methodology



I looked at each of DCU's 34 regular season games and extracted shot locations for each shot taken in each game, regardless of who took the shot. I separated the shots by which two strikers were in the game at the time, taking into account substitutions. I did not count own goals or penalties. This gave me four groups: Espindola and Johnson (E+J), Johnson and Silva (J+S), Espindola and Silva (E+S), and Other. Using the model from above, each shot location was turned into a probability. By looking at these probabilities, we can see how the team shoots given which forwards are in the game.



Results



This table summarizes the initial results:

Minutes Shots Shots/90 Goals xG xG/90 xG/shot E+J 1254 143 10.26316 15 12.91175 0.92668 0.090292 J+S 521 48 8.291747 6 3.76283 0.650009 0.078392 E+S 559 69 11.10912 10 4.837673 0.778874 0.070111 Other 813 95 10.51661 14 8.574199 0.949173 0.090255 Overall 3147 355 10.15253 45 30.08645 0.860432 0.084751

Now, there's a lot here, so I'm going to look at a few things that caught my eye.

First of all, we scored way, way more than my model predicted that we should have (45 vs. 30). Three reasonable explanations for this:

1. The model is inaccurate.

This is true, as I acknowledged above. However, I doubt that it accounts for the entire disparity we see. We'll come back to this one as we look more in depth at the numbers.

2. Our finishing is better than average, so we score more than predicted.

This might be true, and would be nice to believe. To put it in context, though, 2013-14 Man City beat the expected value by 21, and it's undeniable their players are above average. I doubt very much that our finishing ability compared to average is anywhere near the disparity that City's finishing versus average is. So while this may contribute, it almost certainly doesn't explain all of the error. Besides, scoring higher than expected is not repeatable year on year, which leads us to...

3. We were lucky.

We were lucky! Probably some of our scoring is explained by #1 and #2, but this just reinforces that DCU got some bounces to go our way this year.



Next, let's actually look into what I said we would: which forward pairing is best? Here's the same table again so that you don't have to scroll up.

Minutes Shots Shots/90 Goals xG xG/90 xG/shot E+J 1254 143 10.26316 15 12.91175 0.92668 0.090292 J+S 521 48 8.291747 6 3.76283 0.650009 0.078392 E+S 559 69 11.10912 10 4.837673 0.778874 0.070111 Other 813 95 10.51661 14 8.574199 0.949173 0.090255 Overall 3147 355 10.15253 45 30.08645 0.860432 0.084751



To me, the most interesting numbers to look at are the xG/90 (how many times we expect to score per game) and xG/shot (average quality of the shots we take). The Espindola and Johnson pairing actually does very well in both of these categories, pretty significantly higher than the team average. With Johnson and Silva in, we did very poorly in both areas, and also took a very (very very) low number of shots per game. The Espindola + Silva combination looks like it gives us more shots per game, but generally lower quality chances. Absurdly, the Other category actually looks the best of all. More on this later.



Let's look a little bit closer at the Espindola and Silva combination. If you look at the table, we outperformed the expected goals in each category, but we especially beat it with this pairing. This might just chalk up to luck, but I think I have a better explanation. In our previous "more complex model", they found that the second most important factor in an xG model (besides location) is speed of attack. I don't have data to back this up, but the eye test says that EJ tends to slow down play by holding off a defender and then passing backwards. Fabi and Silva pass more incisively to each other, which may lead to faster shooting opportunities. Because my model doesn't take this into account, it may undervalue the actual quality of the chances we get with these two in.



The surprising effectiveness of the Other category shocked me. It includes everything from times when Doyle or Estrada were subbed on to red card situations where we finished with one striker to the Chicago game that Seaton started. Why should this look like the best possible category?



One thought that occurred to me was game state: maybe Doyle (or Estrada or whoever) was subbed on more often when we were more likely to score due to the scoreline. I didn't explore this, but it may merit a future look. I decided to break down these shots by striker to see if there's anything of value here. This was mostly for fun, as no single pairing was anywhere close to a reasonable number of minutes or shots. I (arbitrarily) made the cutoff 5 shots, which is again way too low to draw any real conclusions from. Deal with it.

Minutes Shots Shots/90 Goals xG xG/90 xG/shot Doyle+Fabi 173 29 15.0867052 5 3.096956 1.611133 0.106792 Doyle+Silva 40 7 15.75 0 0.49074 1.104165 0.070106 EJ+Martin 78 11 12.69230769 0 1.049449 1.210903 0.095404 EJ+Rolfe 58 5 7.75862069 2 0.565436 0.877401 0.113087 Fabi 85 5 5.294117647 0 0.433628 0.459136 0.086726 Fabi+Estrada 49 10 18.36734694 1 0.775813 1.424963 0.077581 Fabi+Rolfe 113 14 11.15044248 3 0.902428 0.718748 0.064459 Silva+Pontius 38 5 11.84210526 2 0.389036 0.921402 0.077807

Expectedly, we did poorly when only Fabi was on the field. I'm not going to delve very far into this data except to say holy crap Fabi + Estrada.

Conclusions

EJ + Fabi = Good

EJ + Silva = Bad

Fabi + Silva = Good?

Literally anyone else = BEST

Real Conclusions

Unfortunately, I wasn't able to conclusively find proof that Espindola and Silva are as good as we think they are together. I would need a more complete dataset in order to use a more accurate model to really see what's going on. However, it seems clear that our best play came when Espindola is one of the guys up top, regardless of his partner.

The Johnson + Silva pairing performed very poorly last year, and maybe (maybe maybe) we should consider putting Pontius up top with Silva to start the season with Espindola suspended. There's not enough data to say definitively, but I would not be shocked to see us perform better if we sat EJ.

Please let me know what you guys think! I want to continue to write articles like this one, and want to know how I can make it better/anything else you would like to see.