Abstract Scoring goals in a soccer match can be interpreted as a stochastic process. In the most simple description of a soccer match one assumes that scoring goals follows from independent rate processes of both teams. This would imply simple Poissonian and Markovian behavior. Deviations from this behavior would imply that the previous course of the match has an impact on the present match behavior. Here a general framework for the identification of deviations from this behavior is presented. For this endeavor it is essential to formulate an a priori estimate of the expected number of goals per team in a specific match. This can be done based on our previous work on the estimation of team strengths. Furthermore, the well-known general increase of the number of the goals in the course of a soccer match has to be removed by appropriate normalization. In general, three different types of deviations from a simple rate process can exist. First, the goal rate may depend on the exact time of the previous goals. Second, it may be influenced by the time passed since the previous goal and, third, it may reflect the present score. We show that the Poissonian scenario is fulfilled quite well for the German Bundesliga. However, a detailed analysis reveals significant deviations for the second and third aspect. Dramatic effects are observed if the away team leads by one or two goals in the final part of the match. This analysis allows one to identify generic features about soccer matches and to learn about the hidden complexities behind scoring goals. Among others the reason for the fact that the number of draws is larger than statistically expected can be identified.

Citation: Heuer A, Rubner O (2012) How Does the Past of a Soccer Match Influence Its Future? Concepts and Statistical Analysis. PLoS ONE 7(11): e47678. https://doi.org/10.1371/journal.pone.0047678 Editor: Angel Sánchez, Universidad Carlos III de Madrid, Spain Received: July 20, 2012; Accepted: September 14, 2012; Published: November 30, 2012 Copyright: © 2012 Heuer, Rubner. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: No current external funding sources for this study. Competing interests: The authors have declared that no competing interests exist.

Introduction How the past determines the future is naturally an important question which, however, in most cases is difficult to answer due to the complexity of the real world. This is different in the field of sports, where many aspects can be captured by well-defined numbers (such as goals in the case of soccer). Therefore this field is amenable to this question. In recent years researchers from the physics community have started to apply physics-oriented approaches to problems from the area of sports and in particular of soccer [1]–[3]. Specific examples for a quantitative analysis of the outcome of sports events can be found, e.g., in [4]–[8] and new ranking schemes have been proposed [9]. At first one might think that it is hard to find systematic laws to characterize such complex phenomena as soccer matches. One key step in this endeavor is the definition of appropriate observables to capture some key properties. In recent years we have concentrated on the formal characterization of the notion of a team strength and its practical determination [10]. In this way it was possible to ask questions about the variation of the team strength during a season [11] or the impact of a coach dismissal on the team strength [12]. Alternative concepts of team strengths have been studied, e.g., in Ref.[13] for the case of baseball. Already a long time it has been realized that the distribution of goals, scored by a team, can be roughly described by a Poisson distribution [14]–[16]. Such a distribution is to be expected if the probability to score a goal in the next minute is constant within the whole match. In the most simple stochastic model of a soccer match one might simply assume that both teams score goals according to independent Poisson distributions. Closer inspection of the empirical goal distribution displays, however, some broadening as compared to a Poisson distribution. To rationalize this observation a model has been presented which postulates an increase of the goal rate with an increasing lead [7], [8]. This self-affirmative effect could indeed reproduce the fat tails in the empirical goal distribution. In later work it has been shown that at least for the German soccer league (Bundesliga) these fat tails just follow from the distribution of team strengths [11]. Therefore the fat tails do not contradict the notion that in an individual match the scoring of goals follows Poisson statistics without self-affirmative effects. Interestingly, it turns out that the number of draws is significantly larger (approx. 10%) than expected from the assumption of independent Poisson distributions[16]. Different scenarios may lead to this effect. Here are two extreme cases: (1) A draw in the, let's say, 70th minute reduces the attempts of both teams to score another goal. This leads to an increased probability to keep this score. (2) A score of, e.g., 1∶0, may strongly enhance the willingness of the trailing team to score a goal to reach at least a draw. Whether or not any of these scenarios indeed explain the excess of draws is not clear a priori. Knowledge of such effects would allow one to gain information about psychological effects within a soccer match. The central aim of this work is to derive a stochastic description of the course of a soccer match without resorting to any ad hoc models. Recently, somewhat related questions have been analyzed, e.g., for the case of basketball [17] and tennis [18]. These results can then be compared with the present analysis. The structure of this paper is as follows. In Sect.2 we discuss the statistical framework to elucidate the basic complexities of a soccer match. In Sect.3 the results of this analysis are presented which are finally discussed in Sect.4. As a data basis we take the matches from the Premier German soccer league (Bundesliga) between seasons 1968/69 and 2010/11 (excluding 1992/93 because for this single season the number of teams was different).

Statistical Framework In a specific match of team A vs. team B one may estimate the number of expected goals of team A and of team B based on the strength of both teams [16], [19], [20]. Here we choose the approach as used in Ref. [11]. In more detail, by taking the goal difference and the sum of all goals for the 33 other matches of both teams, considering the regression towards the mean, and adding a team-independent home advantage (see Ref.[10]) one can indeed obtain good estimates of . In what follows we define the goal rate as the probability to score a goal in the next minute. If the goal rate of, e.g., team A does not change during the match one can define the goal rate via . Note that a soccer match lasts for 90 minutes. It is known that in the second half of a soccer match significantly more goals are scored than in the first half (57% vs. 43%). Thus, one may expect that the goal rate increases with time. To capture this effect more quantitatively, we introduce as the goal rate in minute , averaged over all matches, i.e. over all pairs . Note that takes into account the goals of both teams. The resulting curve for is shown in Fig.1. One can indeed see the general increase of with time. Some additional specific features of Fig.1 will be discussed in Sect. III. When summing up over all 90 minutes one obtains the total number of goals per match, denoted . One finds . Correspondingly, a single team on average scores 1.53 goals per match. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 1. The average number of goals per minute in a match as a function of the time t. A more detailed interpretation of the data can be found in the subsequent section. https://doi.org/10.1371/journal.pone.0047678.g001 For the time being we assume that it is indeed possible to define and determine a goal rate of team A at a given minute in an individual match. In reality this is impossible because playing soccer is much more complex than throwing a dice. The arrangement of the soccer players and the ball at some moment and possibly all other available pieces of information only allows a very rough estimation of the probability of a goal during the next minute. However, for the results of this work it will be sufficient to consider averages over a large number of appropriately selected matches. Therefore, in practice the average goal rate will be simply determined from counting the matches where a goal was scored in the minute under consideration. For future purposes we introduce the normalized rate (1) We remind again that the nominator contains the rate for an individual match whereas the denominator expresses the average over all matches. In any event reflects the real course of the match. Furthermore we use the normalized expected number of goals in a specific match of team A (2) In contrast to Eq.1 expresses the a priori expectation. In general, the function can be very complicated and can vary from match to match. For the future discussion it is helpful to identify a limit of maximum simplicity which we denote as the Poisson expectation. It can be formulated via two conditions: (1) The integral of the goal rate over the whole match is identical to the expected number of goals of team A in this match. (2) The time-dependence of is, apart from a proportionality factor, identical to that of and thus follows the average behavior as shown in Fig.1. As a consequence one would have (3)which is equivalent to (4) For this simple Poisson expectation the actual normalized goal rate for a specific match thus equals the pre-match expectation . The key goal of this work is to identify situations where Eq.4 and thus Eq.3 are not valid. For example we consider all matches for which the home team A just before minute 80 leads by m = 3 goals and ask for the probability that the home team scores a goal in the next minute. In what follows always denotes the goal difference. Then we define . The prime indicates a conditional average. In this specific case we average the normalized goal rate of the home team over all matches for which the home team fulfills the required condition ( at minute 80). The actual calculation of basically boils down to the calculation of which, according to our conditioning, denotes the fraction of matches with at minute 80 for which the home team scores a goal in minute 80. In analogy, we define as the corresponding expectation value of for the same subset of teams. Deviations from the relation and the corresponding violation of Eq.4 directly imply that with a home lead of three goals in minute 80 the match behavior is different as expected from the simple Poisson expectation. As will be shown in this work for a home lead by three goals the probability of increasing the home lead is smaller than expected from the Poisson expectation. We note in passing that without conditioning, i.e. by averaging over all matches and all teams, one obtains by definition . An important first step is the systematic identification of the most relevant items for the conditioning of . Let's assume that in minute 70 in total goals have been scored and that the actual score is 2∶1 for team A. The goals of team A were scored in minutes 10 and 60 and the goal of team B in minute 25. Strictly speaking we want to understand the impact of the previous goal events on the goal rate at a given time . is thus conditioned on the sequence of the previous goals as well as the precise time of these goals. This is illustrated in the upper part of Fig.2. Apart from the fact that this complete dependence is impossible to extract from the available information one can dramatically simplify the required conditioning. For a specific example it will be shown that neither the order of the goals (e.g. 0∶1 1∶1 vs. 1∶0 1∶1) nor the absolute times of the goals play a relevant role. We just mention in passing that the latter disagrees with the general belief that goals just before half time are particularly helpful for a team. These observations strongly suggest that in general the dependence on the order and the absolute times of the previous goals is, if existent at all, very weak. Thus, neglecting these pieces of information does not reduce the estimation quality of . As a strict consequence can only depend on the following observables: (1) Score in minute . What is the expected course of the match during minute 70 if, e.g., the home team leads by one goal? In what follows we mainly restrict ourselves to the goal difference rather than to the absolute number of goals. (2) Relative time differences ( . One may indeed expect that scoring a goal may give rise to a minor shock to the opponent which, as a consequence, may bias the match during the minutes after the goal. Naturally the impact of the last goal is strongest. Thus, we only keep track of the time difference . This reduction of information is summarized in the lower part of Fig.2. As soon as the goal rate depends on the present score one leaves the regime of Poisson processes and, in general, (possibly small) deviations from a strict Poisson goal distribution would be expected. Furthermore, any dependence on the time elapsed since the previous goal is a clear signature of non-Markovian effects since memory effects start to play a role. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 2. Scheme to describe the general statistical approach. Upper row: the complete information about the goals for a specific example which contribute to the prediction of the goal rate at time t. Lower row: the reduced information which takes into account the score at time t as well as the time difference to the last goal and the information about the team which scored that goal. https://doi.org/10.1371/journal.pone.0047678.g002

Discussion and Summary Based on our systematic approach to identify deviations either from the the Poisson expectation and/or from a strict Markovian behavior we have obtained several key effects to characterize complexities of soccer matches. (1) After a goal the opponent is less successful to score a goal during the next minutes. This invalidates a strict Markovian picture of soccer matches. This effect, albeit significant, is relatively small (10%). (2) In case of a draw the total goal rate becomes smaller. Thus the goal rates have to be adjusted in dependence of time and score. This is a strong deviation from the Poisson expectation. (3) In case of a lead of the away team dramatic deviations from the Poisson expectation are observed during the last 5–10 minutes of the match. This effect reflects inefficient defensive behavior of the home team. The latter point indicates a dramatic difference of the behavior of home and away teams which goes beyond the mere home advantage. It signals strong psychological and/or tactical differences for home as compared to away teams. Since the offensive efficiency does not become worse the present result does not imply that the home team gives up, at least in case of a one-goal lead by the away team. (4) If, however, the lead of the away team occurs in the middle of the match there are indications of an improved efficiency of the home team to equalize. With respect to (1) it is interesting to refer to recent work on scoring events in basketball. It has successfully been described in terms of a biased continuous time random walk [17]. Ideally the time difference between successive scoring events should follow an exponential distribution. In practice already 20 seconds after a score the actual data follow very well this theoretical expectation. In contrast, in the field of tennis statistics significant deviations from purely statistical behavior have been observed by Magnus and Klaassen [18]. For example, after a break point it is more likely to win the next service game. Interestingly, this effect is more pronounced in matches between non-seeded players. This indicates that with increasing quality of the players the impact of previous effects become smaller, i.e. the match follows more a Markovian behaviour. Our results also allow us to find an answer to our initial question about the origin of the large number of draws. It is the persistence of a draw, i.e. (2), rather than the ability of a team, trailing by one goal, to score an additional goal as expressed by (3). Actually, (3) would rather decrease the number of draws because the probability that a 0∶1 transforms in a 0∶2 during the last minutes is significantly larger than expected. However, we should also stress that at least during the first 80 minutes most observables behave according to the simple Poisson expectation as expressed by Eq. 4. This observation may be used to discuss an important general question. Does the empirical observation of a nearly Poisson-type goal distribution imply that the process of scoring goals is indeed characterized by some fixed rates? Alternatively one might postulate that good teams try to achieve a safe lead and then just start to manage the lead. In this scenario our in-match analysis should have detected much larger deviations from Poisson behavior. For example one might have guessed that is much smaller than expected from the Poisson scenario. Since this is not observed, the teams typically do not change their match behavior. Differences along this line just start to (slightly) occur for . In summary, we may conclude that the concept of score-insensitive goal rates as opposed to score-dependent match behavior is a very good approximation of a soccer match, at least after averaging over the corresponding subset of matches as done in this work. This naturally explains the previous observation [10], [14]–[16] that the goal distribution, after taking into account the different team strengths, follows very nicely a Poisson distribution. This conclusion has an interesting consequence. A match of a good team and bad team may have a priori goal expectations of 2 and 1, respectively. A specific Poisson realization may, e.g., lead to a 3∶0 or (more unlikely) to a 1∶3 result. In both realizations the quality of the good team and that of the bad team are identical because the final result is just a matter of mere luck (in analogy to the presence or absence of the decay of a radioactive nucleus during a fixed time interval). In practice, one might expect that in the first case media stress the successful play of the favorite whereas in the second case the same team would be strongly criticized. This reaction would neglected the random aspects, inherent in any Poisson realization and just show that an objective assessment of random aspects is very difficult. It may be interesting in future work to check whether, e.g., the subset of good teams is less sensitive to negative effects (having just conceded a goal, leading behind at the end of the match). The present results may then serve as a detailed basis for the identification of possible strength-dependent effects.

Acknowledgments We acknowledge very helpful discussions with Bernd Strauss and Dennis Riedl about this work.

Author Contributions Analyzed the data: AH OR. Contributed reagents/materials/analysis tools: AH OR. Wrote the paper: AH OR.