In operant learning, behaviors are reinforced or inhibited in response to the consequences of similar actions taken in the past. However, because in natural environments the “same” situation never recurs, it is essential for the learner to decide what “similar” is so that he can generalize from experience in one state of the world to future actions in different states of the world. The computational principles underlying this generalization are poorly understood, in particular because natural environments are typically too complex to study quantitatively. In this paper we study the principles underlying generalization in operant learning of professional basketball players. In particular, we utilize detailed information about the spatial organization of shot locations to study how players adapt their attacking strategy in real time according to recent events in the game. To quantify this learning, we study how a make \ miss from one location in the court affects the probabilities of shooting from different locations. We show that generalization is not a spatially-local process, nor is governed by the difficulty of the shot. Rather, to a first approximation, players use a simplified binary representation of the court into 2 pt and 3 pt zones. This result indicates that rather than using low-level features, generalization is determined by high-level cognitive processes that incorporate the abstract rules of the game.

According to the law of effect, formulated a century ago by Edward Thorndike, actions which are rewarded in a particular situation are more likely to be executed when that same situation recurs. However, in natural settings the same situation never recurs and therefore, generalization from one state of the world to other states is an essential part of the process of learning. In this paper we utilize basketball statistics to study the computational principles underlying generalization in operant learning of professional basketball players. We show that players are more likely to attempt a field goal from the vicinity of a previously made shot than they are from the vicinity of a missed shot, as expected from the law of effect. However, the outcome of a shot can also affect the likelihood of attempting another shot at a different location. Using hierarchical clustering we characterize the spatial pattern of generalization and show that generalization is primarily determined by the type of shot, 3 pt vs. 2 pt. This result indicates that rather than using low-level features, generalization is determined by high-level cognitive processes that incorporate the abstract rules of the game.

Consider a player in possession of the ball. Multiple factors, including the locations, velocities and postures of team and opponent players, the score and the time in the game are relevant to the decision of whether or not to attempt a field goal (FG). In the framework of RL, all these factors determine the state of the world. In this paper we focus on the spatial location of the payers, which provides us with a low dimensional projection of the state of the world at the time of the FG. Quantifying how the outcome of a FG in one spatial location affects subsequent FGs in different locations is thus informative about the pattern and level of generalization between states. A spatially restricted generalization implies that the outcome of shots made in a particular location would have very little effect of behavior in other locations of the court. By contrast, learning could be independent of shot location, implying substantial spatial generalization. Between these two extremes, a made shot in one location may enhance the probability of another shot from the vicinity of that location, but not from further away locations. Alternatively, the pattern of generalization may be more complex. For example, a made shot in one location may enhance the probability of another shot from the same distance, the same angle or from the symmetrical location relative to the basket. Identifying the patterns of spatial generalization is thus the objective of this study.

Professional basketball, which is played by highly motivated and extensively-trained players, provides an exceptional opportunity to quantitatively study generalization in operant learning in complex natural environments. The objective of players in basketball is to gain points by shooting a ball through a hoop. If successful, the team is awarded with two or three points, depending on the distance of the shot attempt from the basket. In a previous study we demonstrated that players modify their shot selection policy in response to the recent history of their shots and their outcomes [13] . After a made (successful) 3-point (3 pt) shot, the probability of attempting another 3 pt shot is 30% higher than that probability after a missed 3 pt. Moreover, some of the variability in players' shot selection can be accounted for using standard RL algorithms. However, lacking additional information about the shots, our previous study was unable to address the question of what is considered by the players as “the same situation” and “the same action”.

By contrast, in many natural environments, organisms learn from the consequences of their past actions in settings in which the same situation and action never recur (not even in the sense that two “identical” trials “recur” in a laboratory experiment). In these cases, generalization is an essential part of operant learning [9] . In this process of generalization, the organism determines which past situations, actions and their consequences are relevant for the current situation. In the language of RL algorithms discussed above, generalization is the process of determining which set of different situations defines a state and which set of responses defines an action. The level of generalization determines, roughly speaking, the density parsing of the set of situations into states and the set of responses into actions. A limited generalization would result in a large number of state and actions in the process of learning whereas broad generalization would result in a small number of states and actions. Too limited generalization implies that the organism learns values of states that are essentially identical, resulting in too-slow learning. Too broad generalization implies that the organism is inferring the outcome of future responses from irrelevant past experience, which may lead to suboptimal behavior even after very long learning. Thus, the proper level of generalization, which determines the tradeoff between the speed and the accuracy of learning, is of an utmost importance in the process of learning. It should be noted that the question of the proper level of generalization is present even in RL models that assume continuous states and actions [10] , [11] .

Humans and animals modify their behavior in response to the consequences of their previous actions, a process known as operant learning. The standard account for this learning is based on a family of reinforcement learning (RL) algorithms that assert that the computational problem of learning from experience is achieved through the synergy of two processes: first, the values of the different actions (or more generally, state-actions) are learned from past actions and their subsequent rewards; second, these learned values are used to choose (or to learn to choose) among different actions such that actions associated with a higher values are more likely to be chosen [3] – [5] but see also [6] – [8] . This account is based, to a large extent, on a large number of laboratory experiments, in which participants repeatedly choose between the same small number of alternative actions (e.g., press a button) in repeated settings and are rewarded according to these actions.

Of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal will … be more firmly connected with the situation, so that, when it recurs , they will be more likely to recur …’ (Edward Thorndike, 1874–1949) [1] .

Results

The spatial organization of field goal attempts We examined the records of all players from the National Basketball Association (NBA) in four regular seasons and considered their 759,050FGs, measured at a 1×1 ft2 resolution. The spatial distribution of FGs is presented in Figure 1A, which depicts the two-dimensional histogram of the FGs locations, pooled from all players. The white circle denotes the location of the basket and the upper boundary is at the half-court line. The color codes for the number of shots attempted from each location in a logarithmic scale. As shown in Fig. 1A, the distribution of shot locations is not homogeneous. Rather, there are islets of higher FG probability. In our analysis, we used these islets to define 16 regions, delineated by black lines in Fig. 1A. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 1. The spatial organization of learning. A. The spatial distribution of all 759,050 FGs in our dataset. The basket is depicted by a white circle and the upper boundary is the half court line. The color codes for the number of FGs taken from each location in scale. The black lines delineate 16 regions used in subsequent analysis. B. The averaged learning matrix , based on 161,302 FGs attempted by 166 players that passed our selection criteria (see Materials and Methods). C. Top, Dissimilarity matrix, , computed based on the rows of the matrix in B such that is the Euclidian distance between the rows and of ; Middle, Hierarchical clustering of the matrix in B based on the dissimilarity between the rows (see Material and Methods); Bottom, the dissimilarity matrix ordered according to the dendrogram in the middle panel. D. same as in C for the columns of matrix . https://doi.org/10.1371/journal.pcbi.1003623.g001 In order to quantify how the outcome of a FG attempted from region affects subsequent behavior at region we computed, for each player, three probabilities: the a-priori probability that a player would attempt a FG from region , , and two conditional probabilities: the probabilities that a player would attempt a FG from , given that his previous FG was a made or missed FG from region , and respectively ( and denote Successful and Failed FG). These three probabilities determine a learning matrix whose entries are given by (1)To gain insight into , we consider a player that incorporates a fixed policy that is insensitive to the outcome of past FGs, i.e., a player that does not learn from past made and missed FGs. In this case, because behavior is independent of the outcome of the previous FG, the two conditional probabilities are equal, . As a result, . Alternatively, consider the extreme case in which a player is very sensitive to the outcome of the previous FG: after a made FG he always attempts another FG from the same region whereas the FG immediately following a miss FG is never repeated from the same region. In this case, and . In other words, all the diagonal elements of are positive and all off-diagonal elements are non-positive. More generally, for two regions implies a generalization from region to region : A made shot in region motivates subsequent FG attempts from region whereas a missed shot in discourages FG attempts from . Therefore the learning matrix is informative about the generalization pattern in learning. We computed the matrix for all players who passed our selection criterion (166 players, 161,302 FGs, see Materials and Methods). The matrix , averaged over all players, denoted by , is depicted in Fig. 1B. Several points are noteworthy when considering . First, the diagonal elements of tend to be positive (14/16 diagonal elements in are positive, p<0.003, one-tailed binomial test). This implies that a made FG motivates players to attempt another FG from the same region relative to a missed FG. To quantify this tendency to repeat successful actions and to avoid unsuccessful ones, we considered the mean value of the diagonal terms: . Roughly speaking, this average implies that on average, the outcome of a FG changes the probability that a FG will be repeated from the same region by approximately 38%. Second point worthwhile noting is that many of the off-diagonal elements are also positive and that the magnitude of some of them is substantial. For example, the largest off-diagonal element is as large as the largest diagonal element, . However, not all off-diagonal terms are positive. For example, while a made shot in region 1 almost doubles the likelihood of a shot in region 4 compared to a missed FG ( ), it substantially decreases the likelihood of attempting another shot from region 9 ( ) and has almost no effect on the likelihood that the next shot will be from region 8 ( ). To quantify this heterogeneity in the values of the off-diagonal terms, we computed the standard deviation of the distribution of the off-diagonal elements and found that . This number, which is significantly larger than the expected standard deviation in a process, in which transitions between regions on successive FGs are random (p<0.001 Monte Carlo permutation test, see Materials and Methods) is a measure of the spatial heterogeneity in the generalization: different regions differ by approximately 21% in their response to made and missed FGs in the other regions.

Clustering analysis To better understand the pattern of generalization depicted in the matrix (Fig. 1B), we note that the element in , , is a measure of the effect of the outcome of the shot from region on the likelihood that the subsequent shot would be from region . Therefore, the row of the matrix , denoted as , is a measure of the effect of the outcome of a FG attempts in region on all subsequent FG attempts. If two rows of the matrix are similar, , then the outcome of FGs in regions and similarly affect subsequent behavior. By contrast, if these two rows are very different then we can infer that made and missed FGs from these two regions are treated differently in the process of learning. Therefore, the similarity between the rows of the matrix is a measure of the pattern of generalization in the learning. To study the similarity between the rows, we computed the dissimilarity matrix , where is the Euclidian distance between rows and of (Fig. 1C, Top). To identify the regions that similarly affect subsequent behavior we constructed a hierarchical tree (dendrogram) of the rows of (Fig. 1C, Middle) using agglomerative hierarchical clustering (Materials and Methods). The dissimilarity matrix, reordered according to the hierarchical tree is presented in Fig. 1C (Bottom). As clearly seen in Figs. 1C Middle and Bottom, regions 1, 4, 7, 11 and 14 (left branch in Fig. 1C, Middle) are grouped together in the clustering analysis. Interestingly, this grouping follows the separation into 3 pt regions (areas 1, 4, 7, 11 and 14) and 2 pt regions (all other regions). It should be noted that this grouping into 3 pt and 2 pt regions is not spatially local (e.g., regions 1 and 14 are furthest apart). Rather, it reflects the distance from the hoop. Further considering the finer clustering structure, we find that in the 2 pt branch (right branch in Fig. 1C, Middle), regions 2, 5, 8 and 12 are grouped together and are separated from the other 2 pt regions. This grouping contains all long-distance 2 pt regions except one (region 15) and none of the short-distance 2 pt regions. The clustering analysis of Fig. 1C was aimed at finding sets of regions that “affect” all other regions in a similar way. However, there is a complementary way of defining patterns of generalization. We can consider which regions are similarly “affected” by the outcomes of FGs in all other regions. The former analysis (Fig. 1C) is based on prospective similarity, whereas the latter analysis is based on retrospective similarity. Formally, prospective clustering is based on similarity between the rows of whereas retrospective clustering is based on similarity between the columns of . Because the matrix is not symmetrical, prospective and retrospective clustering are not identical and in principle may yield different patterns of clustering. Thus, we repeated the clustering analysis for the columns of (Fig. 1D).The results of this analysis are similar to those of the prospective clustering. The most prominent separation of the regions is into 3 pt and 2 pt regions (Fig. 1D, Middle). Moreover, within the 2 pt branch (right branch in Fig. 1D, Middle), the long-distance 2 pt regions (2, 5, 8, 12 and 15) are also clustered together and separately from the shorter-distance 2 pt regions. In summary, (1) the prospective and the retrospective clustering yielded similar findings; (2) To a first approximation, learning is dominated by the separation of FGs into 2 pt and 3 pt shots, a grouping that is not spatially local. (3) To a lesser extent, the 2 pt FGs are further clustered into two groups, short-distance and long-distance 2 pt FGs. It is interesting to note that the clustering analysis did not reveal any evidence of generalization that is based on the angle of the shooting player from the rim. Next, we further studied how the outcome of a FG attempt in one region affects subsequent attempts in other regions. Because the analysis depicted in Fig. 1 indicates that to a first approximation players cluster the spatial locations into three regions, we used in our subsequent analysis a coarser partition of the court into three regions: 3 pt FG attempts (areas 1, 4, 7, 11 and 14), long-distance 2 pt FG attempts (2, 5, 8, 12 and 15) and short-distance 2 pt FG attempts (all other regions). Similar to the analysis of Fig. 1B, we computed for each player the learning matrix corresponding to this coarser division of the court, , where is defined as (Eq. 1) such that the three regions correspond to 3 pt regions, the long-distance 2 pt regions and the short-distance 2 pt regions, respectively. Averaging over the players yields the 3×3 learning matrix (2)Where each entry denotes the value of standard error of the mean (SEM). Several points are noteworthy. First, is by far the largest element, indicating that made and missed FGs in the 3 pt region primarily affect subsequent 3 pt attempts such that a made 3 pt increases the likelihood of another FG from that region and a missed FG decreased it. This is consistent with our previous study, in which we have demonstrated that the probability of a 3 pt attempt increases after a made 3 pt and decreases after a missed 3 pt [13]. Second, the two clusters of 2 pt FGs are differentially affected by the 3 pt FG attempts. Short-distance, but not long-distance 2 pt are sensitive to the outcome of the previous 3 pt ( and , respectively). Third, is positive and large, indicating that players tend to repeat a long-distance 2 pt if made, and to avoid it if missed. This change of policy comes primarily at the expense of the short-distance 2 pt. Fourth, long-distance 2 pt FG attempts have a positive albeit small effect on 3 pt FGs, such that a made long-distance 2 pt increases the probability of a 3 pt attempt ( ). Finally, made and missed short-distance 2 pt FGs have only a small effect on subsequent FGs ( ).

Distance analysis The results presented in Fig. 1 suggest that in the process of learning, players reduce the complexity of the environment by treating the outcome of FG attempts made at different locations as if they were from the same location. The clustering analysis indicates that this generalization is primarily determined by the distance of the FG attempt from the basket. To better understand how the distance from the basket affects learning, we reanalyzed the spatial pattern of generalization with a finer distance resolution, at the expense of angular information, using to the following procedure: for each player, we binned all FGs according to their distance from the basket at a 2 ft resolution, separately for 2 pt and 3 pt FGs. For each bin, we separated the FGs according to their outcome, made or miss, and separately computed, for each of these outcomes, the conditional probability that the next FG would be a 3 pt FG. The difference between these two conditional probabilities is a measure of the dependence of the magnitude of operant learning on the distance of the FG from the basket. Note that this focus on the difference in conditional probabilities of a 3 pt FGs as a measure of learning, rather than on the distribution of locations of the following FGs as in Fig. 1, is motivated by our finding presented in the previous section that the outcomes of FGs primarily affect the probability of a 3 pt FG ( in Eq. 2). This focus on a scalar learning variable for each distance, rather than a vector, enabled us to study learning at a substantial finer spatial resolution than we could if we have focused on a learning vector, as in Fig. 1. The difference between the conditional probabilities, averaged over all players that passed our selection criterion (300 players, see Material and Methods), is depicted in Fig. 2A, where the blue and red dots depict 2 pt and 3 pt bins, respectively. We find that the effect of the outcome of a FG on the probability that the following FG would be a 3 pt FG increases with the distance of the FG from the basket. This results is in agreement with the finding of the previous section that . However remarkably, the increase in the probability is not continuous. Rather, there is a marked discontinuity in the magnitude of learning when comparing 2 pt and 3 pt FG bins. To further quantify this discontinuity, we used the fact the 3 pt line that separates the 2 pt and 3 pt regions is not equidistant from the basket. Near the corners of the court, the 3 pt line is closer to the basket than near the center. Therefore, whether or not a FG made at a distance between 22 ft and 23.75 ft is a 2 pt or 3 pt FG is determined by the angle to the FG relative to the basket. This enables us to dissociate the effect of distance on learning from the effect of the identity of the FGA on learning. As depicted in Fig. 2A, the leftmost red dot in Fig. 2A and the rightmost blue dot in Fig. 2A correspond to 3 pt and 2 pt FGs attempted at almost identical distance from the basket (23.1 ft and 22.7 ft, respectively). Nevertheless, the difference in the magnitudes of learning, quantified as the differences in the conditional probabilities, is substantial and significant (0.11±0.01 and 0.05±0.01 for the 3 pt and 2 pt FGs, respectively, p<0.001 Monte Carlo permutation test). This discontinuity in the learning magnitudes entails that the abstract classification of a FG as a 2 pt or 3 pt is an important aspect of the generalization. In other words, with respect to learning, players learn from the outcome of 2 pt and 3 pt FGs in a categorically different manner, even if these FGs were attempted from the same distance from the basket. These results imply that rather than low-level features such as the physical distance, reasoning which is based on the abstract rules of the game, dominate the pattern of generalization. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 2. The effect of distance from the basket on learning. A. the difference between the probabilities of attempting a 3 pt after a make and after a miss FG as a function of the distance of the first shot from the basket. FGs were sorted into 2ft wide bins according to their distance from the basket. For each bin and for each player, we calculated the average distance from the basket and the conditional probabilities and then averaged over the players. B. The shooting percentage as a function of the distance from the basket. The percentage is defined as the ratio between the number of made FGs and the number of attempted FGs. Error bars denote the SEM. The blue dots denote 2 pt FGs and the red dots denote 3 pt FGs. Analysis is based on 263,557 FGs of 300 players that passed our selection criteria (see Materials and Methods). https://doi.org/10.1371/journal.pcbi.1003623.g002 Does the pattern of generalization reflect the difficulty of the FG? Naively, one could argue that the more distant a FG is, the more difficult it is and therefore the more informative a made FG is about the current capabilities of the player (and/or the abilities of the opponent players). Therefore, a made long-distance FG influences subsequent FGs more than a made FG short-distance FG. According to this view, 3 pt FGs are more difficult than 2 pt FGs. Therefore, they are more informative and thus have a larger effect on subsequent FGs. Moreover, because there is a categorical difference in payoff associated with 2 pt and 3 pt FGs, the defense team is likely to be more motivated to prevent made 3 pt FGs than to prevent made 2 pt FGs. As a result, 3 pt FGs may be better guarded and thus categorically more difficult than 2 pt FGs. Such discontinuity in the difficulty could result in a discontinuity in the learning magnitude in the transition from 2 pt to 3 pt FGs, depicted in Fig. 2A. In order to test this hypothesis, we computed the shooting percentage of FGs from different distances for the same 300 players analyzed in Fig. 2A. The shooting percentage is the ratio of made FGs and attempted FGs, and thus is a measure of the difficulty of the FG. The average shooting percentage as a function of the distance is depicted in Fig. 2B. As predicted, the shooting percentage decreases with the distance from the basket. However, the dependence of the percentage on the distance does not closely follow the dependence of the learning signal on the distance. In particular, the shooting percentages of 2 pt and 3 pt FGs from the same distance are 0.393±0.007 (rightmost blue dot) and 0.395±0.006 (leftmost red dot), respectively, which are not significantly different from each other (p>0.16, Monte Carlo permutation test). Thus, a difference between the difficulties of the 2 pt and 3 pt FGs cannot account for the discontinuity in the magnitude of learning. This result indicates that it is the identity of the shot as a 2 pt or 3 pt shot per se, and not the difficulty of the FG, that plays the dominant role in the players' pattern of generalization.