Experimental design

Our experiment was designed to closely resemble a number of previous studies of finitely repeated Prisoner's Dilemma (PD)2,3,10. Anonymous individuals were randomly paired to play a series of ten-round repeated games of PD, where in each round each player was required to choose one of two actions—cooperate (C) or defect (D)—after which they received a payoff from the payoff matrix displayed in Table 1 (see also Supplementary Fig. 1 for screenshots). We note that the payoffs were chosen to satisfy the usual PD inequalities (T=7)>(R=5)>(P=3)>(S=1) and 2R>T+S; moreover, they were chosen to correspond to the normalized quantities g= =1 and l= =1, which are toward the low end of the normal range for previous studies2,3,10,33,34,35,36. After each round, both players were shown the action of the other player, and each could see their own payoff as well as cumulative payoffs up to that game and for the entire experiment (see Supplementary Fig. 1). After each ten-round game players entered a virtual waiting room until all other games had completed (a counter informed players how many others were also waiting), at which point they were randomly reassigned to new partners and a new set of games commenced. This process was repeated 20 times over the course of a single session, where we again emphasize that players remained anonymous and unidentifiable throughout (see Supplementary Fig. 2 for a visual representation of a single day).

Table 1 Per round payoff for (Row Player, Column Player). Full size table

Our experiment’s main point of departure from previous work was that rather than conducting our experiment for a single session we retained the same population of subjects for 20 such sessions, held at the same time on consecutive weekdays over the period 4 August – 31 August 2015. The experiment commenced with 113 subjects recruited in advance from Amazon’s Mechanical Turk. To minimize latency in the user interface and language barriers in delivering instructions, we restricted participation to residents of the US and Canada; however, the subject pool was otherwise diverse with respect to location (31 US states), age (18–61) and gender (47% female) (see Supplementary Figs 3 and 4 for more details of the player population). Also to minimize latency, we split the population into two sessions held each day at 13:00 hours EDT (n=56) and 15:00 hours EDT (n=57), respectively. Players were assigned randomly to a session at the outset of the experiment and were retained in that session for the duration of the experiment. Although there were some slight differences between the two sessions, behaviour—including attrition—was qualitatively indistinguishable, thus for all results stated in the main text we treat the two sessions as a single population (noting that pooling of subjects from multiple experimental sessions is a common practice in traditional lab experiments). Sessions lasted an average of 35 min and players were paid in proportion to their cumulative payoff. Players earned an average of $4.47 per session corresponding to an hourly wage of ∼$7.66, substantially higher than the self-reported average wage for tasks on Mechanical Turk37. To minimize attrition, we also offered an additional one-time bonus of $20 for completing at least 18 of the 20 sessions, payable at the end of the experiment. Subjects who missed more than two sessions were excluded from the experiment and prevented from completing any remaining sessions, thereby forfeiting the bonus along with any unearned compensation. Of the initial population 94 subjects (83%) satisfied our completion criterion, earning an average variable compensation of $87.03 and $107.03 in total (we found no significant differences between dropouts and non-dropouts; see ‘Methods’ section for more details of recruiting and attrition). Over the course of the experiment these subjects played an average of 375 ten-round games each, making 3,720 individual decisions each for a total of 374,251 decisions collectively (see Supplementary Fig. 5 for a visual representation of the entire experiment).

Initial cooperation and unravelling

Figure 1a shows cooperation levels in rounds 1 (green), 8 (blue), 9 (purple) and 10 (red) over the course of the experiment. On day 1 the first round cooperation rates started at over 80%, a figure that is not unprecedented among previous studies38, but is substantially higher than the usual range of 40–60% (refs 3, 9, 30, 33). There are a number of reasons why our set-up may have led to overall higher-than-typical cooperation. First, although previous work39,40 has found that players recruited from MTurk cooperate at similar rates to those in lab studies, it is possible that the recent evolution of the MTurk community has resulted in a population that is more cooperative than the usual, also non-representative41, population of subjects present in traditional lab experiments. Second, prior work10 has noted that cooperation rates in finitely repeated games are sensitive to choices in the game matrix parameters g and l, where lower values correspond to more cooperation. As noted above, our values g=1 and l=1 were at the low end of previous studies, thus it is not surprising that we recover relatively high cooperation rates. Third, prior work10 has also shown that the duration of a finitely repeated game is highly predictive of initial cooperation levels. Our games, which were ten rounds long, were relatively long compared with previous experiments; thus once again it is not surprising that cooperation levels were relatively high. Moreover, analogous logic would suggest that the overall duration of the experiment could also be related to cooperation levels. Because our design required us to inform participants about the length of the experiment, this knowledge may also have led to more cooperative behaviour. Finally, although players were not explicitly told the size of the population with whom they were being matched, they could have inferred this information from the counter in the virtual waiting room. Likewise, they were not directly informed that they were playing with the same population every day but could have inferred as much from their instructions, and hence could have reasonably concluded that they would anonymously encounter the same players several times over the course of the experiment. It is plausible, therefore, that the general expectation of repeated interactions also facilitated cooperative behaviour.

Figure 1: Cooperation over time. (a) Average cooperation rate for rounds 1 (green), 8 (blue), 9 (purple) and 10 (red) as a function of time over the 400 games of the experiment. The experiment ran for 20 consecutive weekdays, each of which comprised 20 games of 10 rounds each. Cooperation in rounds 9 and 10 clearly diminishes for several days, consistent with unravelling dynamics observed in prior work, but then appears to stabilize. (b–d) Cooperation as function of round (black lines) on days 1 (b), 11 (c) and 20 (d). Each day comprised 20 consecutive games of 10 rounds each, yielding 200 rounds in total. In all cases, coloured lines correspond to cooperation levels for rounds 1 (green), 8 (blue), 9 (purple) and 10 (red). The same pattern of unravelling in early days followed by stabilization is apparent. Full size image

In other respects Fig. 1 shows that early behaviour closely resembled results from similar previous experiments. Specifically, Fig. 1b shows that cooperation levels, which remained high during the early rounds of each repeated game, dropped to a relatively low level in the final rounds, exhibiting the so-called ‘end game’ effect predicted by the rationality hypothesis1. Moreover, between games cooperation levels exhibited the well documented ‘restart effect’42 in which cooperation jumps sharply from the last round of game j to the first round of game j+1. Other than the relatively high average level of cooperation, therefore, the dynamics of session play was qualitatively similar to previous experiments of comparable duration2,3,10,38. Importantly, first session play also lends support to the rationality hypothesis: cooperation levels in round 1 (green line) increased slightly over the course of the session, but decreased steadily for rounds 9 (purple line) and 10 (red line), consistent with previous claims of unravelling2,10,31. Also importantly, Fig. 1a shows that the decrease in cooperation during rounds 9 and 10 continued for several days, but then slowed dramatically for the remainder of the experiment. Supporting this claim, Fig. 1c,d show that cooperation levels on days 11 and 20, respectively, continued to start high for each game and drop sharply as the end-game approached, but that there was much less change over the course of a session. Moreover, the relatively small decreases in rounds 9 and 10 cooperation that did occur over the course of a session largely ‘reset’ themselves at the start of the next session such that there was little change from day to day.

Unravelling stabilizes after several days

Figure 2 shows the same general trends in three different ways. First, Fig. 2a shows the average rate of cooperation by round, broken down by day. Consistent with the observations from Fig. 1, the pattern of cooperation at first changes from day to day, increasing in early rounds and decreasing in later rounds, but then appears to stabilize after several days (green through purple). Second, Fig. 2b shows the daily average of the game restart effect—that is, the difference between round 10 on game j and round 1 on game j+1—over the course of the experiment. Again consistent with the results above, the restart effect increases sharply for the first several days as the end-game effect visible in Fig. 2a becomes more pronounced, but again it stabilizes after several days. Finally, Fig. 2c shows the session restart effect (as distinct from the game restart effect): the difference in cooperation levels for rounds 9 and 10, respectively, during game 1 of day d+1 compared with game 20 of day d (orange box plots). For comparison, Fig. 2c also shows the corresponding difference between successive games within the same session (green box plots). Whereas the across-game difference is slightly negative within a session, the across-session effect is large and positive (on 17.2 and 13.6% for rounds 9 and 10 respectively), largely accounting for the ‘reset’ effect noted above in Fig. 1.

Figure 2: Stabilization of cooperation. (a) Cooperation by round averaged over the course of a 20-game session, grouped by day. Early days (coloured red through green) show the sharpening of end-game effect (that is, initial cooperation increases but drops off further and more suddenly as the end-game approaches), after which the pattern stabilizes (green through purple). (b) Average restart effect between games (that is, difference in cooperation rate between round 10 of game j and round 1 of game j+1). Consistent with (a), the restart effect increases for several days then stabilizes. (c) The stabilization of cooperation is partly accounted for by the cross-session restart effect (orange): the jump in cooperation rate between the last game of day d and the first game of day d+1 for rounds 9 (left) and 10 (right). For comparison, the corresponding within-session effect (that is, difference in round 9/10 cooperation rate between successive games within a session) is also shown (teal). Full size image

Taken together, Figs 1 and 2 suggest that play can be broken into two phases: an ‘unravelling’ phase during which players start defecting on progressively earlier rounds, and a ‘stable’ phase during which unravelling abates. Addressing this question more systematically, Fig. 3 shows the distribution of round of first defection, r d for each day of the experiment. To identify the onset of a stable phase, we apply a two-sample Kolmogorov–Smirnov (K–S) test to successive days, finding that day-to-day changes are significant up to day 7 but then insignificant thereafter (see ‘Methods’ section for details). In addition, the onset of a ‘stable’ state at roughly day 7 can be inferred in at least two other ways: first, by noting the change of slope in the cooperation rates for rounds 9 and 10 (Fig. 1a); and second, by observing the between-game ‘restart effect’, which rises for the first several days and then stabilizes, again around day 7 (see Fig. 2b). Although these measures are less precise than the K–S test applied to the distribution of round of first defection, they both yield similar results. We therefore identify day 7 as the end of the unravelling phase (although we note that the precise day on which stabilization occurs is relatively unimportant for our results) and hereafter treat the period spanning days 7–20 as the stable phase.

Figure 3: Stabilization of defection. Distribution of round of first defection, r d , over all games by day. The last bin, C, indicates games where neither player defected. In days 1–6, players appear to converge on one of a number of threshold strategies, in which they cooperate conditionally until some predetermined ‘threshold’ round r i and then defect unilaterally. During this interval the modal round of first defection also creeps earlier. The red highlighted region denotes the ‘stable’ phase of the experiment during which the distribution of round of first defection remains sufficiently similar from day to day that a K–S test is non-significant. Full size image

Figure 3 also reveals three additional trends of interest. First, during the unravelling phase the left-hand bar—comprising a small group of early defectors—largely disappears, consistent with the assertion10 that players first converge on one of a number of ‘threshold’ strategies. That is, they cooperate conditionally until some predetermined round r i after which they defect unconditionally (one player continued to defect in all rounds throughout the experiment). Second, among initially cooperative players there is a drift toward earlier first defection, again consistent with the conjecture that rational players, having settled on a threshold strategy, begin to slowly unravel. Finally, however, Fig. 3 also provides some direct evidence for the existence of a significant minority of players who do not appear to follow the unravelling pattern. Specifically, we observe that fully cooperative games occurred at rates between 15 and 20% for the duration of the experiment. Since players were paired randomly, and a game where neither player defected requires both players to be conditional cooperators, then a frequency of 16% of games with no defection implies a 40% frequency of conditional cooperators.

Identification of resilient cooperators

Summarizing, Figs 1, 2, 3 suggest that, consistent with the rational cooperation hypothesis, a majority of players first converge onto one of a number of threshold rules, and then subsequently exhibit ‘unravelling’ as their thresholds creep earlier with experience. Strikingly, however, Figs 1, 2, 3 also suggest that a significant minority do not exhibit this pattern, but rather consistently behave like conditional cooperators. To test for these different player types more systematically, we exploit the roughly 3,720 observations per player to identify individual-level strategies as well as their evolution over time. Specifically, we estimate for each player i a unique strategy s i (j) for each game j from among eleven predefined strategies: ten ‘threshold’ strategies T x for each round x=1, …, 10, according to which a player conditionally cooperates up to round x−1 and then defects unilaterally from round x, and CC for players who conditionally cooperate for the duration of the game (see ‘Methods’ section for details). Figure 4a shows inferred strategies for the 94 players who completed the experiment: each row of 400 cells represents a single player i, where each cell is coloured to indicate i’s inferred strategy for a single game j. Figure 4a reveals three main results. First, consistent with previous work2,3,10, the 11 predefined strategies account for a large fraction of all player-game observations; specifically, the fraction of ‘other’ strategies declines from about 19% on day 1 to <1% by day 7 (see Supplementary Fig. 6). Second, Fig. 4a shows that roughly 60% (n=58) of players exhibited behaviour consistent with the rational cooperation hypothesis: starting out playing CC but then switching to progressively less cooperative threshold strategies (that is, T 10 , T 9 , T 8 , T 7 ). Third, however, almost 40% of players (n=36) displayed no such systematic unravelling tendency, consistently playing CC throughout the experiment. Figure 4b which shows a histogram of % games playing CC during the stable interval (days 7–20) shows that in fact these 36 players, who occupy the right-hand mode of the histogram, all play CC in at least 80% of games. Finally, Fig. 4c shows the average daily payoffs for the 36 players who played CC (blue line) versus that of the other players (red line): the two groups had similar payoffs on the first day, when all players were cooperating at similar rates; however, for all subsequent days CC players received lower payoffs than threshold players by a large and significant margin (|t|>5.3, P<10−6 for each day d≥2).

Figure 4: Identification of resilient cooperators. (a) Inferred strategies over time. Each cell represents the strategy for a single player (row) for a single game (column), and is coloured by cooperativeness (more blue ⇒ more cooperative; more red ⇒ less cooperative). T 1 –T 10 refer to ‘threshold’ strategies, where a player playing strategy T x conditionally cooperates up to round x−1 and then defects unilaterally from round x. T 1 corresponds to defection on every round and CC corresponds to full conditional cooperation (also known as ‘grim trigger’). Grey regions refer to play that was not consistent with any of the assumed strategies. White regions refer to missing games. (b) Histogram of % of games classified as CC. The right-hand mode comprises 36 players who play CC in at least 80% of games; these players are identified as resilient cooperators. (c) Average per-round payoffs of players identified as resilient cooperators (blue line) and rational players (red line) respectively (averages are computed over games for each day; error bars are s.e.). Full size image

On the basis of this evidence we conclude (a) that roughly 40% of players were ‘resilient cooperators’ who persistently behaved as conditional cooperators even at substantial cost to themselves; and (b) the remainder were ‘rational’ in that they cooperated only inasmuch as they believed it was in their selfish best interest to do so. We also confirm this behavioural classification of resilient cooperators with self-reported evidence from an exit survey conducted at the completion of the experiment; of the 94 subjects who completed the entire experiment, 38 reported that they had intentionally cooperated as long as their partner did, and had resisted the temptation to defect first. Moreover, they reported that they had maintained this strategy throughout the experiment even after perceiving others to have behaved selfishly (see ‘Methods’ section for more details of self-reported strategies). Importantly we found that 33 of the individuals whom we identified as conditional cooperators in this manner were also among the 36 individuals in the right-hand mode of Fig. 4b, indicating extremely high agreement between quantitative and qualitative classification schemes (see Supplementary Fig. 7 for additional analysis of resilient cooperators by gender and age, and Supplementary Fig. 8 for analysis by experience).

Resilient cooperators permanently stabilize cooperation

The existence of resilient cooperators in turn suggests an explanation for the observed slowdown in unravelling: as the rational players learned the true fraction of conditional cooperators in the population, they converged on a ‘partially unravelled’ state that balanced the risk of exploitation by other rational players with the potential gains from cooperation with CC players. If correct, this explanation would also suggest that the observed slowdown was permanent and that cooperation levels by the end of the experiment were close to their asymptotic limit. To test these related hypotheses we simulated an agent-based model comprising two types of agents: resilient cooperators who unconditionally play CC for the entire duration; and ‘rational’ players who continually update their beliefs about the distribution of player types in the population and then choose among available threshold strategies T x so as to maximize their expected payoff given their beliefs. Specifically, in each game the rational players: (a) form beliefs about the strategies being played by other agents based on their past opponents’ play; (b) conditional on these beliefs, calculate their expected utility for each available strategy; and (c) stochastically update their current strategy in proportion to each potential strategy’s expected utility (see ‘Methods’ section for details). By systematically varying the fraction α of resilient cooperators we can explore their impact on unravelling.

Figures 5a,b show the results of the simulation for α=0 and α=0.4, respectively, for N=100 agents. In the absence of resilient cooperators (Fig. 5a), rational players exhibit exactly the unravelling predicted by the rational cooperation hypothesis1,2,10,31: over the course of 400 games, players unravel almost uniformly through T 10 all the way down to T 1 , albeit progressively more slowly for lower thresholds. In contrast, when 40% of players are resilient cooperators (Fig. 5b), corresponding to what we observed in our experiment, unravelling is curtailed, with T 9 emerging as the modal strategy and significant fractions occupying T 10 and T 8 . Encouragingly Fig. 5b bears a close resemblance to Fig. 4a, suggesting that in fact the entire distribution of steady-state strategies of agents in the simulation is similar to that for our experimental subjects.

Figure 5: Resilient cooperators stabilize cooperation in an agent-based model. In both cases, T 1 –T 10 refer to threshold strategies, where a player playing strategy T x will cooperate conditionally up to round x and then will defect unilaterally. T 1 corresponds to defection on every round and CC corresponds to full conditional cooperation (also known as ‘grim trigger’) (a) Individual strategies for 100 simulated agents over the course of 400 games in the absence of resilient cooperators (that is, all agents are rational cooperators who selfishly best-respond to the inferred distribution of strategies in the population). In this case cooperation unravels completely. (b) Individual strategies for the same model but with 40% resilient cooperators and 60% rational agents. In this case cooperation stabilizes after 100–150 games (equivalent to 5–8 days) Full size image

In addition to replicating the high-level results of our experiment, the learning model also makes two predictions. First, as shown in Fig. 6a, cooperation in rounds 8, 9 and 10 for the α=0.4 case remains stable for at least 4,000 games, ten times the length of our experiment. This result suggests that the apparent stabilization of cooperation that we observe in the experiment after 7 days is not simply a slowing down of the unravelling process, but an end to it. In other words, the model predicts that with sufficiently many resilient cooperators present in a population of rational cooperators, cooperation can be sustained indefinitely. Second, the model also makes a prediction about how many resilient cooperators are necessary to sustain cooperation even among rational cooperators. To show this result, we first define r ∞ as the average first round of defection r d for rational players as it approaches its asymptotic limit (in practice we estimate r ∞ by running the simulations for at least 2,000 games). Figure 6b shows estimated r ∞ as a function of α along with the values of α≈0.4, r ∞ ≈8.2 obtained from our experiment (averaged over the stable phase, days 7–20). In addition to reinforcing the agreement between experiment and simulation noted above, Fig. 6b also predicts the full functional dependency of r ∞ (α). Notably, r ∞ appears to undergo a sharp transition, resembling an epidemic threshold43, at some critical value α * ≈0.1: for α<α * unravelling progresses all the way to the beginning of the game (r ∞ =1), whereas for α>α * , r ∞ increases sharply and nonlinearly, eventually approaching r ∞ =10 (that is, no unravelling) when α=1 (see Supplementary Figs 9 and 10 for robustness checks).