So what does this mean? I think there are a few things about this analysis that are useful. First, it’s just an objective way to describe team patterns, which is thought-provoking in its own right. It’s also useful to put other statistics into context though. For example, Chicago are actually second-to-last in shots taken so far this year, yet at the top of my table above. If Chicago has a problem, it isn't that they choose not to shoot, it’s that they don’t seem to get the ball in good enough shooting positions. New England is the opposite. They’re near the top of the league in shots taken, despite (according to the above analysis) being rather stingy in their shooting choices. This means they must be getting the ball in goal-dangerous places pretty frequently.

There are, of course limitations to the approach I’ve taken to explore this topic. First of all, this doesn’t really say anything about where the opposing defenders are. Maybe a team choses to shoot liberally because the opposing teams tend to leave them the space to do it. Maybe another team chooses to dribble because it simply isn’t an option to shoot. To put it another way, I’m assuming that all three offensive actions are possible, which isn’t always true (though some players do clearly have a knack for just “finding a way to shoot”). Another is that I’m simplifying some of the complexity of offensive actions by just looking at this “pseudo expected goals” metric. A more complete analysis might look at multi-dimensional patterns (like a heatmap or something…stay tuned for my next article) rather than relying on a data-reduction technique like I’ve done here. Then again, univariate analyses are simpler to interpret. Lastly, there are a handful of statistical assumptions made through the use of logistic and multinomial regressions. I won’t get into those, but they mostly make it hard to do things like quantify uncertainty, compare p-values and so on. I didn’t even go there.

Despite the limitations, I think this is a telling analysis of how teams play. Over time, these figures could be revisited visualize changing patterns or even evaluate something like the effect a new hire has on team “style”. Maybe most of all though, it gives me (a Sounders fan) pause before I annoyingly yell something like “THEY NEVER SHOOT THE BALL WHEN THEY CAN”. That’s just my bias speaking.

Methodological Note 1: What I mean by “pseudo-expected goals”

The basic principle of this analysis is that a player’s location on the field is the main determinant of which type of action a player takes, but that can vary by team. So, the goal is to measure the probability of each action as a function of how good of a shooting location it was taken at. In other words, if a given team has the ball at the top of the 18 (a relatively good shooting location), what’s the probability that they will shoot, pass or dribble? And how does that compare to the top of the 6 (an even better shooting location)? How does that compare to the corner of the box (worse)? What are these probabilities at any arbitrary location in the defensive half (a very bad shooting location)? Key to answering the research question above is to do this for each team.

First, to assess field position. I computed what could be called “pseudo-expected goals”, or expected goals (xG) where the location and angle are the only factors in the metric. Specifically, I estimated the odds of a shot turning into a goal for everywhere on the field using all observed shots (and whether or not they were successful) so far this season. That’s a sample size of 1,244 shots from various locations, 130 of which found the back of the net. This was done using a simple logistic regression (per usual for xG-like analyses). I limited this to shots from open play and shots with the ball at the players’ feet for comparability with passes and dribbles.

Next, to connect this to the choice of actions. Like the shots data, passes and dribbles can be pinpointed to exactly where on the field they happened. So I computed the odds of a shot going in at every location in the data, regardless of type of action. In other words it’s like asking “if a player somehow found a way to shoot from where he passed/dribbled, what would be the odds of scoring?”

So far, this basically just takes the locations of all the observed offensive actions and quantifies how goal-dangerous that position was. The hard part is using it to determine the tendencies of different teams. For this, I used a technique called multinomial logistic regression (see Methodological Note 2). The basic idea is that in theory a player could try to shoot, pass or dribble from anywhere, and the choice is up to him. As the ball gets closer and closer to the goal, it becomes more likely that he will shoot, but it’s not the same for every team. Multinomial logistic regression allows me to measure the probability of each of the three actions (pass/shoot/dribble) happening as a function of field position (measured as “pseudo expected-goals” defined above) and do it for each team separately. In (even more) statistical speak, that amounts to an interaction term between team fixed effects and pseudo-expected goals.

Methodological Note 2: Further Details on Regressions

Multinomial logistic regression is just a handy statistical method for measuring the odds of one of a few options happening. If regular logistic regression measures the odds of “heads” in a coin toss, multinomial logistic regression measures the odds of each number in dice-roll (with any number of sides to the die).

The important thing is that it estimates those odds in tandem with each other so that they perfectly sum to 100%. In the first graphs above, you can literally sum the height of the three lines at any cross section along the x-axis and get 100%. Based on the principles in Methodological Note 1, the task is to do exactly that between the three offensive choices.

Back to the pseudo expected goals regression, here’s the actual regression formula: