Average passing networks

Figure 1 shows an example of a football passing network, in this case the average network of FCB against Real Madrid in the season 2009/2010. Note that links are unidirectional (from player A to player B) and weighted according to the number of passes between players. In the figure, nodes (i.e., players) are placed in the average position from where their passes were made and the width of the links is proportional to the number of passes between players. Also note that both the \(x\) and \(y\) coordinates of the field are bounded between [0,100] and are measured in “field units” (f.u.), since not all fields have exactly the same dimensions. Finally, the radius of the nodes is proportional to their importance in the passing network, quantified by means of the eigenvector centrality (see Methods).

Figure 1 Schematic illustration of a football passing network. In the plot, players are represented by circular nodes, whose size is proportional to their eigenvector centrality, a mesure of importance in the network structure. The position of each player is given by the average of the positions of all passes made by the player along the match. The width of the links is proportional to their weights, which account for the number of passes between players. Note that links are unidirectional. In this example, we plot the average passing network of the match between F.C. Barcelona and Real Madrid, played during the season 2009/2010 at Santiago Bernabeu Stadium. Datasets leading to the passing network were provided by Opta. Full size image

First, we analyzed the average passing networks of all matches played by FCB during season 2009/2010 (\(38\) in total), obtaining the networks of FCB and their rivals. Specifically, we obtain 2 average passing networks for each match (1 per team), both of them including all passes and positions along the match and projecting them into a single network for each team. See the Methods section for details about the construction of average passing networks. Previous literature about average passing networks has shown that they reveal information about the way a team is organized50 and are also related with team performance51.

Figure 2 shows the comparison between 8 different parameters obtained for FCB and its rivals. Four of them, (a) the number of passes \(L\), (b) the number of shots to goal \({M}_{{shots}}\), (c) the number of goals \({M}_{{goals}}\) and (d) the number of points \({M}_{{points}}\) (at the end of the season) are classical metrics of the team performance. Note that, in order to compute these 4 variables, there is no need to obtain and analyze the network structure of each team, despite some of them (i.e., the number of passes) can affect the organization of the passing networks. The other 4 parameters of Fig. 2 are related to the spatial properties of the networks: (e) \(x\)-coordinate of the network centroid 〈X〉, (f) y-coordinate of the network centroid 〈Y〉, (g) dispersion of the position of the players around the network centroid NC disp and (h) average ratio between the passing distance parallel and perpendicular to the opponent’s goal 〈Δ y 〉/〈Δ x 〉 (see Methods for details). Left bars in all plots correspond to the average values of these metrics for all matches of FCB along the season, while right bars are the same metrics obtained for the rivals at the same matches. FCB is always averaged with itself, while all other teams are averaged together, the reason being that we are only interested in observing differences between the FCB and all other teams. Error bars account for the standard deviations of each metric. Plots in yellow highlight statistically significant differences (see Methods for details about the statistical analysis).

Figure 2 Comparison of 8 classical football metrics. In all plots, left bars are the average (during the whole season) of a given metric for FCB, while right bars correspond to the average of the rivals in the matches played against FCB. Metrics are, specifically: (A) number of passes, (B) number of shots, (C) number of goals, (D) number of points at the end of the season, (E) x-coordinate of the network centroid 〈X〉, (F) y-coordinate of the network centroid 〈Y〉, (G) the spatial dispersion (in field units) of the players around the network centroid and (H) the advance ratio 〈Δ y 〉/〈Δ x 〉, obtained as the ratio between the total length 〈Δ y 〉 of the y-coordinate of all passes divided by the total length 〈Δ x 〉 of the x-coordinate, both distances in field units. Direction x is towards the goal, while direction y is parallel to the opponents goal (see axis of Fig. 1). Parameters having statistically significant differences between FCB and their rivals are plotted in yellow. Full size image

As we can see in Fig. 2A, the number of passes made by FCB is much higher than the average of their rivals. This fact is a consequence of Guardiola’s playing style, focused on keeping the ball as much as possible (“In football, I am very selfish: I want the ball for me”, “take the ball, pass the ball”52). The high number of passes unavoidably leads to passing networks with links that have higher weights and, as we will see, this fact will have consequences on the network parameters. The number of shots to goal is also higher in FCB (Fig. 2B), leading to a higher number of goals (Fig. 2C) and, ultimately to a high number of points accumulated during the analyzed matches (Fig. 2D). In fact, FCB won the league with 99 points (31 wins, 6 ties and only 1 loss). Note that these four metrics (passes, shots, goals and points), specially the last three, are traditionally considered as indicators of the team performance, thus revealing that FCB was the best team during season 2009/2010.

Bottom plots of Fig. 2 are related with the spatial features of Guardiola’s team. The 〈X〉 and 〈Y〉 average coordinates of all passes made during the match define the network centroid (or the network center of mass). We can observe in Fig. 2E how FCB played closer to the opponents goal (〈X〉 FCB > 〈X〉 rivals ), while no differences are found at the 〈Y〉 coordinate (Fig. 2F), indicating no preference for any of the sides of the pitch. Interestingly, the dispersion of the position of the players around the centroid (see Methods) is slightly higher for FCB, which indicates that the area covered by the initial position of the passes made by all players is wider (Fig. 2G). Finally, it is worth analyzing the ratio of advance 〈Δ y 〉/〈Δ x 〉, which is an indicator of the direction of the passes of a team, since the Δ y = y 2 − y 1 of a pass is the difference between the y-coordinates at the final (y 2 ) and initial points (y 1 ) of a pass, while Δ x is defined, accordingly, for the x-coordinate. In Fig. 2H, we can observe how FCB has a ratio of advance much higher than the rivals, which reveals that passes are more parallel to the opponent’s goal than the rest of the teams. Note that this metric is independent from the number of passes, and it is an indicator of how “direct” the game of a team is. Clearly, FCB is not concerned about advancing directly towards the goal, but on moving the ball in parallel, probably to find the most adequate moment to advance.

But, how is the structure of the average passing networks? And, more importantly, are there differences between FCB and the rest of the teams? Figure 3 shows the comparison of 6 parameters directly related with the topological organization of the average passing networks (see Methods for a detailed description of all these network parameters). In Fig. 3A, we plot the clustering coefficient C, which is related to the amount of triangles created between any triplet of players. Clustering coefficient is an indicator of the local robustness of networks31, since when a triangle connecting three nodes (i.e. players) exists, and a link (i.e., pass) between two nodes is lost (i.e., not possible to make the pass), there is an alternative way of reaching the other node passing through the other two edges of the triangle. In football, the clustering coefficient mesures the triangulation between three players. As we can observe in Fig. 3A the value of C is much higher in FCB, which reveals that connections between three players are more abundant than at their rivals. The average shortest path d is an indicator about how well connected are players inside a team. It measures the “topological distance” that the ball must go through to connect any two players of the team. Since the links of the passing networks are weighted with the number of passes, the topological distance of a given link is defined as the inverse of the number of passes. The higher the number of passes between two players, the closer (i.e., lower) the topological distance between them is. Furthermore, since it is the ball that travels from one player to any other, it is possible to find the shortest path between any pair of players by computing the shortest topological distance between them, no matter if it is a direct connection or if it involves passing through other players of the team. Finally, the average shortest path d of a team is just the average of the shortest path between all pairs of players. As we can observe in Fig. 3B, the shortest path of FCB is much lower than their rivals, which reveals that players are better connected between them. As we will discuss later, note that this fact could be produced by the network organization or just being a consequence of having a higher number of passes, which reduces the overall topological distance of the links and, consequently, the value of d.

Figure 3 Comparison of 6 network parameters. In all plots, left bars are the average (during the whole season) of a given parameter for FCB, while right bars correspond to the average of the rivals in the matches played against FCB. Parameters are, specifically: (A) clustering coefficient C, (B) shortest-path length d, (C) largest eigenvalue λ 1 of the connectivity matrix A, (D) algebraic connectivity \({\tilde{\lambda }}_{2}\) of the Laplacian matrix \(\tilde{L}\), (E) dispersion of the players’ centrality and (F) maximum player centrality. See Methods section for details about the explanation (and calculation) of all network parameters. Parameters having statistically significant differences between FCB and their rivals are plotted in yellow. Full size image

Figure 3C shows the comparison between the largest eigenvalue λ 1 of the connectivity matrix A (also known as the weighted adjacency matrix), whose elements a ij contain the number of passes between players i and j31. The largest eigenvalue has been used as a quantifier of the network strength53, since it increases with the number of nodes and links (see Methods). As expected (due to the high number of passes), the largest eigenvalue λ 1 of FCB is much higher than the corresponding values of its rivals. This metric reveals the higher robustness of the passing network of Guardiola’s team, which indicates that an eventual loss of passes would have less consequences in F.C. Barcelona than in the rest of the teams.

It is also worth analyzing the behavior of the second smallest eigenvalue \({\tilde{\lambda }}_{2}\) of the Laplacian matrix \(\tilde{L}\), also known as the algebraic connectivity (see Methods). The value of \({\tilde{\lambda }}_{2}\) is related to several network properties. In synchronization, networks with higher \({\tilde{\lambda }}_{2}\) require less time to synchronize54 and in diffusion processes, the time to reach equilibrium also goes with the inverse of \({\tilde{\lambda }}_{2}\). In the context of football passing networks, \({\tilde{\lambda }}_{2}\) can be interpreted as a metric for quantifying the division of a team. The reason is that low values of \({\tilde{\lambda }}_{2}\) indicate that a network is close to be split into two groups, eventually breaking for \({\tilde{\lambda }}_{2}=0\). In this way, the higher the value of \({\tilde{\lambda }}_{2}\) the more interconnected the team is, being a measure of structural cohesion. In Fig. 3D, we have plot the comparison of \({\tilde{\lambda }}_{2}\), which reveals that FCB attacking and defensive lines are more intermingled, leading to a \({\tilde{\lambda }}_{2}\) higher than its rivals.

Finally, Fig. 3E-F show how centrality (i.e., the importance of the players inside the passing network) is distributed along the team, a metric calculated by means of the eigenvector related to the largest eigenvalue of the connectivity matrix (see Methods). Figure 3E contains the average dispersion of centrality and Fig. 3F shows the highest value of a single player. In both cases, differences are not statistically significant to support evidences of a different centrality distribution between FCB and the rest of the teams.

Temporal evolution of the network metrics

As we have seen in the previous Section, average passing networks show differences between the organization of FCB and its rivals. However, these difference may be interpreted as a consequence of the higher number of passes between Barcelona players, which could lead to statistically significant differences in a diversity of network metrics, namely, a reduction of the average shortest path d and an increase of the clustering coefficient C, largest eigenvalue λ 1 and algebraic connectivity \({\tilde{\lambda }}_{2}\).

In view of these results, two questions must be addressed before any interpretation: (i) Is just the number of passes behind the differences of the network parameters? and (ii) is it enough to look at the average values of the network metrics? To address both issues, we have conducted a complementary study where passing networks are constructed in a different way. On the one hand, we are going to define passing networks as non-static entities, thus evolving in time, and we will track the evolution of their parameters. On the other hand, we are going to exclude the importance of the number of passes, in order to just focus on the topological organization of the networks. With these two objectives in mind, we construct the l−pass networks of a team as the networks containing l consecutive passes, with l ≪ L, being L the total number of passes during the match. In our study, we set l = 50, since it is a value low enough to allow a tracking of the network evolution along the match and, at the same time, high enough to guarantee the creation of a network between players (too low values of l would lead to networks with disconnected components). Therefore, we obtain the 50-pass networks in the following way: (i) we construct the network of the first 50 passes of a team since the beginning of the match, (ii) we calculate its parameters, (iii) we dismiss the oldest pass and include (sequentially) a new one, (iv) we recalculate the network parameters and (v) we repeat the procedure until the last pass of the match is included.

Note that 50-pass networks contain exactly the same number of passes for both teams and, thus, any difference between network metrics can not be attributed to the total number of passes. In addition, also note that metrics evolve in time and their values can be related to a certain moment of the match. However, it is also important to remark that the time required to construct a 50-pass network can differ from team to team.

Figure 4 shows an example of the evolution of 3 parameters of the 50-pass networks of two teams along a match, specifically, the 〈X〉 coordinate of the centroid (A), the ratio of advance 〈Δ y 〉/〈Δ x 〉 (B), and the dispersion of the network centrality (C). Parameters are calculated, for both teams, during the match between Real Madrid (red lines) and FCB (green lines), whose final score was 0–2. Vertical lines indicate the moment at which a goal was scored. Figure 4A shows how the position of the team moves forward and backward during the match. In this particular case, Real Madrid plays, most of the time, more advanced than FCB, which did not lead to an advantage in the result. Note how the centroid of FCB seems to be more stable, while Real Madrid has higher fluctuations, arriving to its maximum value around minute 63. Also note how FCB is the first team to construct the 50-pass network around minute 9, while Real Madrid required 20 minutes.

Figure 4 Real Madrid (red lines) vs. F.C. Barcelona (green lines), season 2009/2010 (final result: 0−2).Temporal evolution of the network parameters: (A) 〈X〉 coordinate of the networks’ centroid, (B) ratio of advance 〈Δ y 〉/〈Δ x 〉 and (C) the centrality dispersion EC disp . Vertical dashed lines indicate the two moments at which FCB scored a goal (Real Madrid did not score). Full size image

In Fig. 4B, we plot the ratio of advance of the 50-pass networks of both teams. Again we can see fluctuations of the parameter during the match. Specifically, FCB has a highest value during the first part of the match. However, we can observe how Real Madrid increases its advance ratio as time goes by, eventually overcoming FCB during the second half.

Finally, Fig. 4C shows the fluctuations of the centrality dispersion of the players of both teams. We can observe how Real Madrid has a strong increase of the centrality dispersion between minutes 50 and 70, which seems to be related with the period where the centroid of the team advances towards FCB’s goal (see Fig. 4A). This change of the centrality distribution could be related to a change of the style of playing. Since centrality dispersion increases, there is a higher heterogeneity in the importance of the players in the passing networks, which could be related to the fact that a few players are taking the lead of the team. However, this change in the organization of the passing network does not seem to be effective, since the second goal of FCB comes around to the maximum of centrality dispersion.

The fact that network metrics change during the match increases the complexity of the study. It is expected that several factors may influence the fluctuations of the network parameters (a goal, a substitution, physical condition, etc…) and, furthermore, not all teams may behave in the same way. From the diversity of factors, here we are going to focus on the particular organization of each team before a goal. With this aim, we have analyzed the value of the network parameters, for all teams, before scoring/receiving a goal. Our purpose is to detect the existence of differences in the network metrics and identify those parameters that change before scoring or receiving a goal.

Figure 5 shows the average values of 4 temporal and spatial metrics obtained before scoring/receiving a goal (during season 2009/2010). The diagonal line (y = x) helps to identify those metrics that behave differently when scoring or receiving a goal. In Fig. 5A we can observe how FCB is the team requiring less time to construct the 50-pass network, both when scoring or receiving a goal. In fact, as indicated by the diagonal line, it takes approximately the same time in both cases. On the opposite side, we find Athletic Club and Osasuna, both teams characterized by a direct game towards the opponents’s goal. Concerning the 〈X〉 position of the centroid, we can observe in Fig. 5B that, despite having a high value, FCB is not the team that constructs its network closest to the opponent’s goal, since it is overcome by Real Madrid and Tenerife. Note that Tenerife ended up the season in the last position, which indicates that playing forward it is not a sufficient condition to achieve good results. However, it is also worth noting that all teams, with the only exception of Osasuna, are placed above the line given by the function 〈X〉 scored = 〈X〉 received . This fact reveals that when a team scores a goal is, in average, playing more advanced than when it receives it. In Fig. 5C we have compared the ratio of advance 〈Δ y 〉/〈Δ x 〉 of all teams, showing that Barcelona is not only the team with the highest value (both when scoring and receiving a goal) but also the one deviated the most from the the diagonal line. In this way, FCB is the team that increases the most its probability of scoring a goal when increasing the ratio of advance. Finally, Fig. 5D shows the average dispersion of the position of the players around the centroid coordinates of the 50-pass network. Interestingly, we can observe how FCB is one of the teams with lower dispersion of La Liga and, furthermore, the dispersion increases before a goal is received, indicating that FCB performs better when players are closer to the network centroid.

Figure 5 Temporal and spatial metrics change before scoring/receiving a goal: (A) time required to construct a 50-pass network t net , (B) position of the X coordinate of the 50-pass network centroid, (C) 〈Δ y 〉/〈Δ x 〉 advance ratio and (D) dispersion of the distance of the players with regard to the centroid. Metrics are obtained for all teams and are shown in a two-dimensional plot, where the horizontal axis corresponds to the value of a metric when the team receives a goal and the vertical axis is the same metric obtained when the team scores a goal. Solid lines correspond to the function y = x, helping to identify wether a given parameter increases or decreases when a goal is scored/received. Each point represents the average along the whole season. Full size image

Figure 6 shows, in a similar way, the values of 6 different network parameters obtained for all teams (during the whole season). Interestingly, FCB has the highest values of the league at 4 of them: The clustering coefficient (Fig. 6A), the largest eigenvalue of the connectivity matrix (Fig. 6C), the centrality dispersion (Fig. 6E) and the highest centrality of a player (Fig. 6F). High values of these four metrics are related to strong and robust networks: (i) a high clustering coefficient is an indicator of local robustness31,38, (ii) the largest eigenvalue λ 1 is also an indicator of global robustness53; when the number of nodes and links are the same, λ 1 increases when important players are, in turn, connected between them, (iii) a high centrality dispersion together with a high value of maximum centrality are indicators of heterogeneity in the network structure, and heterogeneous networks are know to have strong resilience against random failures55 (i.e., the loss of weight of the links, due to lost passes, would have less impact on the overall structure).

Figure 6 Network parameters depend on scoring/receiving a goal. (A) clustering coefficient C, (B) average shortest-path d, (C) largest eigenvalue λ 1 of the connectivity matrix, (D) algebraic connectivity \({\tilde{\lambda }}_{2}\), (E) centrality dispersion EC disp and (f) highest eigenvector centrality EC max . Parameters are obtained for all teams and are shown in a two-dimensional plot, where the horizontal axis corresponds to the value of a metric when the team receives a goal and the vertical axis is the same metric obtained when the team scores a goal. Solid lines correspond to the function y = x, helping to identify wether a given parameter increases or decreases when a goal is scored/received. Each point represents the average along the whole season. Full size image

At the same time, the analysis shows low values at other 2 metrics: the shortest-path length d (Fig. 6B) and the algebraic connectivity \({\tilde{\lambda }}_{2}\) (Fig. 6D). In this case, having a low shortest-path length is an indicator of a better connection between players, since the ball can travel from a player to any other in a lower number of steps. Finally, it is interesting to note that FCB has one of the lowest algebraic connectivities, which is an indicator of structural integration. Low values of \({\tilde{\lambda }}_{2}\) reflect that the team is more split into two different groups. Note that, when the algebraic connectivity \({\tilde{\lambda }}_{2}\) is calculated from the average connectivity matrix (Fig. 3D), FCB has a value higher than their rivals, reflecting a higher cohesion of the whole team. However, when it is computed from the 50-pass networks, FCB algebraic connectivity is one of the lowest. A possible explanation is that cohesion of the team may be grounded on a higher number of passes between players, and not on the topological organization of the network.