Effective payoffs and evolutionarily stable strategies

In order to determine whether a strategy will succeed in a population, Maynard Smith proposed the concept of an ‘evolutionarily stable strategy’ (or ESS)4. For a game involving arbitrary strategies I and J, the ESS is easily determined by an inspection of the payoff matrices of the game as follows: I is an ESS if the payoff E(I,I) when playing itself is larger than the payoff E(J, I) between any other strategy J and I, that is, I is ESS if E(I,I) >E(J,I). In case E(I,I)=E(J,I), then I is an ESS if at the same time E(I,J) >E(J,J). These equations teach us a fundamental lesson in evolutionary biology: it is not sufficient for a strategy to outcompete another strategy in direct competition, that is, winning is not everything. Rather, a strategy must also play well against itself. The reason for this is that if a strategy plays well against an opponent but reaps less of a benefit competing against itself, then it will be able to invade a population but will quickly have to compete against its own offspring and its rate of expansion slows down. This is even more pronounced in populations with a spatial structure, where offspring are placed predominantly close to the progenitor. If the competing strategy in comparison plays very well against itself, then a strategy that only plays well against an opponent may not even be able to invade.

If we assume that two opponents play a sufficiently large number of games, their average payoff approaches the payoff of the Markov stationary state1,12. We can use this mean expected payoff as the payoff to be used in the payoff matrix E that will determine the ESS. For ZD strategies playing O (other) strategies, we know that ZD enforces E(O,ZD)= shown in equation (2), while ZD receives (equation (3)). But what are the diagonal entries in this matrix? We know that ZD enforces the score (2) regardless of the opponent’s strategy, which implies that it also enforces this on another ZD strategist. Thus, E(ZD,ZD)= . The payoff of O against itself only depends on O’s strategy : E(O,O)= , and is the key variable in the game once the ZD strategy is fixed. The effective payoff matrix then becomes

Note that by writing mean expected payoffs into a matrix such as equation (5), we have effectively defined a new game in which the possible moves are ZD and O. It is in terms of these moves that we can now consider dominance and evolutionary stability of these strategies.

The payoff matrix of any game can be brought into a normalized form with vanishing diagonals without affecting the competitive dynamics of the strategies by subtracting a constant term from each column13, so the effective payoff (5) is equivalent to

We notice that the fixed payoff has disappeared, and the winner of the competition is determined entirely from the sign of , as seen by an inspection of the ESS equations. If , the ZD strategy is a weak ESS (see, for example, Nowak14, page 55). If , the opposing strategy is the ESS. In principle, a mixed strategy (a population mixture of two strategies that are in equilibrium) can be an ESS4 but this is not possible here precisely because ZD enforces the same score on others as it does on itself.

Evolutionary dynamics of ZD via replicator equations

ZD strategies are defined by setting two of the four probabilities in a strategy to specific values so that the payoff E(O,ZD) depends only on the other two, but not on the strategy O. Press and Dyson1 chose to fix p 2 and p 3 , leaving p 1 and p 4 to define a family of ZD strategies. The requirement of a vanishing determinant limits the possible values of p 1 and p 4 to the values given by the grey area in Fig. 1. Let us study an example ZD strategy defined by the values p 1 =0.99 and p 4 =0.01. The results we present do not depend on the choice of the ZD strategy. An inspection of equation (2) shows that, if we use the standard payoffs of the PD (R, S, T, P)=(3, 0, 5, 1), then . If we study the strategy ‘All-D’ (always defect, defined by the strategy vector as opponent) we find that , while . As a consequence, is negative and All-D is the ESS, that is, ZD will lose the evolutionary competition with All-D. However, this is not surprising as ZD’s payoff against All-D is in fact lower than what ZD forces All-D to accept, as mentioned earlier. But let us consider the competition of ZD with the strategy ‘Pavlov’, which is a strategy of ‘win-stay-lose-shift’ that can outperform the well-known ‘tit-for-tat’ strategy in direct competition15. Pavlov is given by the strategy vector , which (given the ZD strategy described above and the standard payoffs) returns E(ZD,PAV)=11/27≈2.455 to the ZD player, while Pavlov is forced to receive . Thus, ZD wins every direct competition with Pavlov. Yet, Pavlov is the ESS because it cooperates with itself, so . We show in Fig. 1 that the dominance of Pavlov is not restricted to the example ZD strategy p 1 =0.99 and p 4 =0.01, but holds for all possible ZD strategies within the allowed space.

Figure 1: Mean expected payoff of arbitrary ZD strategies playing ‘Pavlov’. The payoff (red surface) defined by the allowed set (p 1 , p 4 ) (shaded region) against the strategy Pavlov, given by the probabilities =(1, 0, 0, 1). As is everywhere smaller than (except on the line p 1 =1), it is Pavlov, which is the ESS for all allowed values (p 1 ,p 4 ), according to equation (6). For p 1 =1, ZD and Pavlov are equivalent as the entire payoff matrix (6) vanishes (even though the strategies are not the same). Full size image

We can check that Pavlov is the ESS by following the population fractions as determined by the replicator equations5,13,16, which describe the frequencies of strategies in a population

where π i is the population fraction of strategy i, w i is the fitness of strategy i, and is the average fitness in the population. In our case, the fitness of strategy i is the mean payoff for this strategy, so

and . We show π ZD and π All-D (with π ZD +π All-D =1) in Fig. 2 as a function of time for different initial conditions, and confirm that Pavlov drives ZD to extinction regardless of initial density.

Figure 2: Population fractions of ZD versus Pavlov over time. Population fractions π ZD (blue) and π PAV (green) as a function of time for initial ZD concentrations π ZD (0) between 0.1 and 0.9. Full size image

Evolutionary dynamics of ZD in agent-based simulations

It could be argued that an analysis of evolutionary stability within the replicator equations ignores the complex game play that occurs in populations where the payoff is determined in each game, and where two strategies meet by chance and survive based on their accumulated fitness. We can test this by following ZD strategies in contest with Pavlov in an agent-based simulation with a fixed population size of N pop =1,024 agents, a fixed replacement rate of 0.1% and using a fitness-proportional selection scheme (a death-birth Moran process, see Methods).

In Fig. 3, we show the population fractions π ZD and π PAV for two different initial conditions (π ZD (0)=0.4 and 0.6), using a full agent-based simulation (solid lines) or using the replicator equations (dotted lines). Although the trajectories differ in detail (likely because in the agent-based simulations generations overlap, the number of encounters is not infinite but dictated by the replacement rate, and payoffs are accumulated over eight opponents randomly chosen from the population), the dynamics are qualitatively the same. (This can also be shown for any other stochastic strategy playing against a ZD strategy.) Note that in the agent-based simulations, strategies have to play the first move unconditionally. In the results presented in Fig. 3, we have set this ‘first move’ probability to p C =0.5 for both Pavlov and ZD.

Figure 3: Population fractions using agent-based simulations and replicator equations. Population fractions π ZD (blue tones) and π PAV (green tones) for two different initial concentrations. The solid lines show the average of the population fraction from 40 agent-based simulations as a function of evolutionary time measured in updates, while the dashed lines show the corresponding replicator equations. As time is measured differently in agent-based simulations as opposed to the replicator equations, we applied an overall scale to the time variable of the Runge–Kutta simulation of equation (7) to match the agent-based simulation. Full size image

Agent-based simulations thus corroborate what the replicator equations have already told us, namely that ZD strategies have a hard time surviving in populations because they suffer from the same low payoff that they impose on other strategies if faced with their own kind. However, ZD can win some battles, in particular against strategies that cooperate. For example, the stochastic cooperator GC (‘general cooperator’, defined by =(0.935, 0.229, 0.266, 0.42)) is the evolutionarily dominating strategy (the fixed point) that evolved at low mutation rates shown by Iliopoulos et al.8 GC is a cooperator that is very generous, cooperating after mutual defection almost half the time. GC loses out (in the evolutionary sense) against ZD because E(Z,GC)=2.125 while E(GC,GC)=2.11 (making ZD a weak ESS), and ZD certainly wins (again in the evolutionary sense) against the unconditional deterministic strategy ‘All-C’ that always cooperates (see equation (16) in the Methods). If this is the case, how is it possible that GC is the evolutionary fixed point rather than ZD, when strategies are free to evolve from random ancestors8?

Mutational instability of ZD strategies

To test how ZD fares in a simulation where strategies can evolve (in the previous sections, we only considered the competition between strategies that are fixed), we ran evolutionary (agent-based) simulations in which strategies are encoded genetically. The genome itself evolves via random mutation and fitness-proportional selection. For stochastic strategies, the probabilities are encoded in five genes (one unconditional and four conditional probabilities drawn from a uniform distribution when mutated) and evolved as described in the Methods and by Iliopoulos et al.8 Rather than starting the evolution runs with random strategies, we seeded them with the particular ZD strategy we have discussed here (p 1 =0.99 and p 4 =0.01). These simulations show that when we use a mutation rate that favors the strategy GC as the fixed point, ZD evolves into it even though ZD outcompetes GC at zero mutation rate as we saw in the previous section. In Fig. 4, we show the four probabilities that define a strategy over the evolutionary line of descent (LOD), followed over 50,000 updates of the population (with a replacement rate of 1%, this translates on average to 500 generations). The evolutionary LOD is created by taking one of the final genotypes that arose, and following its ancestry backwards mutation by mutation, to arrive at the ZD ancestor used to seed the simulation17. (Because of the competitive exclusion principle18, the individual LODs of all the final genotypes collapse to a single LOD with a fairly recent common ancestor). The LOD confirms what we had found earlier8, namely that the evolutionary fixed points are independent of the starting strategy and simply reflect the optimal strategy given the amount of uncertainty (here introduced via mutations) in the environment. We thus conclude that ZD is unstable in another sense (besides not being an ESS): it is genetically or mutationally unstable, as mutations of ZD are probably not ZD, and we have shown earlier that ZD generally does not do well against other strategies that defect but are not ZD themselves.

Figure 4: Evolution of probabilities on the evolutionary line of descent (LOD). Evolution of probabilities p 1 (blue), p 2 (green), p 3 (red) and p 4 (teal) on the evolutionary LOD of a well-mixed population of 1,024 agents, seeded with the ZD strategy (p 1 , p 2 , p 3 , p 4 )=(0.99, 0.97, 0.02, 0.01). Lines of descent (see Methods) are averaged over 40 independent runs. Mutation rate per gene μ=1%, replacement rate r=1%. Full size image

Stability of extortionate ZD strategies

Extortionate ZD strategies (‘ZDe’ strategies) are those that set the ratio of the ZD strategist’s payoff against a non-ZD strategy1 rather than setting the opponent’s absolute payoff. Against a ZDe strategy, all the opponent can do (in a direct matchup) is to increase their own payoff by optimizing their strategy, but as this increases ZDe’s payoff commensurately, the ratio (set by an extortion factor χ, where χ=1 represents a fair game) remains the same. Press and Dyson1 show that for ZDe strategies with extortion factor χ, the best achievable payoffs for each strategy are (using the conventional iterated PD values (R, S, T, P)=(3, 0, 5, 1))

which implies that E(ZDe, O) >E(O, ZDe) for all χ>1. However, ZDe plays terribly against other ZDe strategies, who are defined by a set of probabilities given by Press and Dyson1. Notably, ZDe strategies have p 4 =0, that is, they never cooperate after both opponents defect. It is easy to show that for p 4 =0, the mean payoff E(ZDe, ZDe)=P in general, that is, the payoff for mutual defection. As a consequence, ZDe can never be an ESS (not even a weak one) as E(O, ZDe) >E(ZDe, ZDe) for all finite χ≥1, except when χ→∞, where ZDe can be ESS along with an opponent’s strategy that has a mean payoff E(O,O) not larger than P. We note that the celebrated strategy ‘tit-for-tat’ is technically a ZDe strategy, albeit a trivial one as it always plays fair (χ=1).

Given that ZD and ZDe are evolutionarily unstable against a large fraction of stochastic strategies, is there no value to this strategy then? We will argue below that strategies that play ZD against non-ZD strategies but a different strategy (for example cooperation) against themselves, may very well be highly fit in the evolutionary sense, and emerge in appropriate evolution experiments.

ZD strategies that can recognize other players

Clearly, winning against your opponents is not everything if this impairs the payoff against similar or identical strategies. But what if a strategy could recognize who they play against, and switch strategies depending on the nature of the opponent? For example, such a strategy would play ZD against others, but cooperate with other ZD strategists instead. It is in principle possible to design strategies that use a (public or secret) tag to decide between strategies. Riolo et al.19 designed a game where agents could donate costly resources only to players that were sufficiently similar to them (given a tag). This was later abstracted into a model in which players can use different payoff matrices (such as those for the PD or the ‘Stag Hunt’ game) depending on the tag of the opponent20. Recognizing another player’s identity can in principle be accomplished in two ways: the players can simply record an opponent’s tag and select a strategy accordingly21, or they can try to recognize a strategy by probing the opponent with particular plays. When using tags, it is possible that players cheat by imitating the tag of the opponent22 (in that case, it is necessary for players to agree on a new tag so that they can continue to reliably recognize each other).

A tag-based ZD strategy (‘ZDt’) can cooperate with itself, while playing ZD against non-ZD players. Let us first test that using tags renders ZDt evolutionarily stable against a strategy that ZD loses to, namely All-D. The effective payoff matrix becomes (using the standard payoff values and our example ZD strategy p 1 =0.99, p 4 =0.01)

and we note that now both ZDt and All-D can be an ESS. The game described by the matrix (equation 12) belongs to the class of coordination games (a typical example is the Stag Hunt game23), which means that the interior fixed point of the dynamics (π ZDt , π All-D )=(0.2, 0.8) is itself unstable13 and the winner of the competition depends on the initial density of strategies. This is a favourable game for ZDt, as it will outcompete All-D as long a its initial density exceeds 20% of the population. What happens if the opposing strategy acquires the capacity to distinguish self from non-self as well? The optimal strategy in that case would defect against ZDt players, but cooperate with itself. The effective payoff matrix then becomes (‘CD’ is the conditional defector)

This game is again in the class of coordination games, but the interior unstable fixed point is now (π ZDt , π CD )=(9/13, 4/13), which is not at all favourable for ZDt anymore as the strategy now needs to constitute over 69% of the population in order to drive the conditional defector into extinction. We thus find that tag-based play leads to dominance based on numbers (if players cooperate with their own kind), where a ZDt is only favored if it is the only one that can recognize itself. Indeed, tag-based recognition is used to enhance cooperation among animals via the so-called ‘green-beard’ effect24,25, and can give rise to cycles between mutualism and altruism26. Recognizing a strategy from behaviour rather than from a tag is discussed further below. Note that whether a player’s strategy is identified by a tag or learned from interaction, in both cases it is communication that enables cooperation8.

Short-memory players cannot set the rules of the game

In order to recognize a player’s strategy via its actions, it is necessary to be able to send complex sequences of plays, and react conditionally on the opponent’s actions. In order to be able to do this, a strategy must be able to use more than just the previous plays (memory-one strategy). This appears to contradict the conclusion reached by Press and Dyson1 that the shortest-memory player ‘sets the rule of the game’. This conclusion was reached by correctly noting that in a direct competition of a long-memory player and a short-memory player, the payoff to both players is unchanged if the longer-memory player uses a ‘marginalized’ short-memory strategy. However, as we have seen earlier, in an evolutionary setting it is necessary to not only take cross-strategy competitions into account, but also how the strategies fare when playing against themselves, that is, like-strategies. Then, it is clear that a long-memory strategy will be able to recognize itself (simply by noting that the responses are incompatible with a ‘marginal’ strategy) and therefore distinguish itself from others. Thus, it appears possible that evolutionarily successful ZD strategies can be designed that use longer memories to distinguish self from non-self. Of course, such strategies will be vulnerable to mutated strategies that look sufficiently like a ZD player so that ZD will avoid them (a form of defensive mimicry27), while being subtly exploited by the mimics instead.