Simulative strategies dominate in more complex environments

In our model a mind-reading strategy is determined by the mind-reading strategy-type and by a number of continuous traits, which affect the level of inhibition of coordination. We focus on two main questions: (1) Under which circumstances are as-actor networks recruited for mind-reading? (2) Can emotional contagion evolve as a consequence of mind-reading? The first question is addressed here by tracking the frequency of the different strategy-types, \(F\), \(P\) and \(S\), in an infinite population. We address the second question in the next section, performing an evolutionary invasion analysis on the continuous traits influencing simulative strategies, \({u}_{D}\) and \({u}_{B}\).

First, we compute the expected probabilities of a B response as actor, and D and C as an observer using one of the three different strategies, indicated by the superscripts P, F and S. These determine the fitness of the mind-reading strategies, following Eq. 1. We report here expressions for the simplified case where every stimulus has intensity x = 1 and is sufficient for correct response given that an individual has enough experience, i.e. α(1) = 1 and \(li{m}_{t\to \infty }\,l\mathrm{(1},t)={l}_{t}(t)=1-{e}^{-{\lambda }_{t}t}\). In this case the payoff of an actor is

$${\pi }_{+}={P}_{B}\,b=\frac{\lambda (1-p)}{{\lambda }_{d}+{\lambda }_{e}+\lambda (1-p)}b,$$ (3)

while the expression for the fitness of an observer depends on its strategy, indicated as superscript:

$${\pi }_{-}^{S}={l}_{x}({u}_{D}\mathrm{)((1}-\alpha ({u}_{D}))\,{\gamma }_{a}\,{d}^{-}-\alpha ({u}_{D})\,{c}^{-})\,{P}_{B}^{2}$$ (4)

$${\pi }_{-}^{F}=\frac{{\lambda }_{d}}{{\lambda }_{e}+{\lambda }_{d}}{P}_{B}\,{d}^{-},$$ (5)

$${\pi }_{-}^{P}=\frac{p{\lambda }_{l}}{{\lambda }_{d}+{\lambda }_{e}+p{\lambda }_{l}}{P}_{B}\,{d}^{-},$$ (6)

with \({l}_{x}(x)=1-{e}^{-{\rho }_{l}}\). Notice that the payoffs of all observers depend on the behavior of the observed actors \({P}_{B}^{\circ }\), but not on other observers. Thus, when P B differs between individuals with different mind-reading strategies the evolutionary dynamics of this system are frequency-dependent. We explore this case in the appendix, where we show that different mind-reading strategies can be bistable. Here, we restrict to the simpler frequency-independent case, for which all main results hold. In this case, we can write the replicator dynamics of the system as:

$$\frac{d{y}^{j}}{dt}={y}^{j}({\pi }^{j}-\varphi )={y}^{j}({\pi }_{-}^{j}-{\varphi }_{-})$$ (7)

where j indicates one of the strategies P, F or S, and where \(\varphi ={y}^{F}\,{\pi }^{F}+{y}^{P}\,{\pi }^{P}+{y}^{S}\,{\pi }^{S}\) is the average fitness, and \({\varphi }_{-}={y}^{F}\,{\pi }_{-}^{F}+{y}^{P}\,{\pi }_{-}^{P}+{y}^{S}\,{\pi }_{-}^{S}\) is the average of the as-observer components of the payoff. This system is characterized by transcritical bifurcations: each strategy dominates when its social fitness π − is higher than those of other strategies. Thus, we can explore under which conditions the different strategies would evolve by simply comparing the expression for the social fitness in Eqs 4–6 (Fig. 2a). These expressions reflect the different sources of information that the three strategies use to cope with uncertainty, which we explore here in the form of environmental variability, behavioral complexity, and inter-individual differences.

Figure 2 Evolutionary dynamics of the three mind-reading strategies, F (black), P (gray) and S (white), as a function of average time spent in a single environmental state (\(\bar{t}\,=\,\mathrm{1/(}{\lambda }_{e}+{\lambda }_{d})\), x-axis) and the fraction of effective as-observer vs as-actor stimuli (p − , y-axis). Gray dots indicate the combinations of parameters for which the evolutionary dynamics are shown in the ternary plots, when \(S\) (right vertex), P (top) or F (left) dominate. Here we consider a fixed u B , hence a single optimal S-strategy S* exists, independently of the frequencies of other strategies. In (a) the parameters are u B = 0.3, \({\rho }_{a}=13.8155/{u}_{B}\), normally distributed x with mean 0.5 and deviation 0.1, ρ l = 5, λ d = 1, \({d}^{-}={d}^{+}\mathrm{=5}\), \({c}^{-}={c}^{+}\mathrm{=3}\), \({\lambda }_{l}=50\). In (b) the learning rate is λ 1 = 10, corresponding to the presence of 5 times more stimuli. In (c,d) individuals differ from each other in their responses with average probability ψ = 0.5. In addition, in (d) individuals interact twice as often with individuals with the same behavior. Full size image

In Fig. 2a we modify environmental variability through the parameter λ e , the rate of environmental change. Strategy F does not rely on learning, but on a fixed mapping optimized by evolution. Hence while this strategy is very efficient for predictable environments (low λ e ), it cannot adapt to new environmental states. This is apparent in the term \({\lambda }_{d}/({\lambda }_{d}+{\lambda }_{e})\) in Eq. 5, that coincides with the fraction of time spent in a single predictable environmental state. Conversely, both strategy P and S use learning to adapt to new environments, and dominate when environmental variation increases. However, P learns only during social interactions, and the probability of successful D responses decreases linearly with p. For this reason, S evolves at lower values of p. Remarkably, S outperforms all other strategies as complexity increases. As information about the social context decreases, the experience gained as an actor becomes more valuable. This result is exemplified by simple analytical expressions obtainable in the simple case of an innate as-actor behavior. In this case the fitness of strategy S is independent of complexity, whereas the fitness of an observer using F or P strategies is \({d}_{-}\,{\lambda }_{d}/({\lambda }_{e}+\mathrm{1)}\) versus \({d}_{-}\,p\,{\lambda }_{l}/({\lambda }_{d}+{\lambda }_{e}+p{\lambda }_{l})\), respectively. Both of these decrease with environmental variability λ e , whilst that of strategy P also decreases with reduced opportunity of direct as-observer learning p. Therefore S -strategies dominate when environmental complexity increases. This pattern is consistent across the assumptions and models tested. In general, learning as an actor is faster than as an observer (supplementary material section 1.5.4), due to the uncertainty involved in social cues. Information about other individuals is inherently noisy or inaccurate, since an observer may not know what stimulus or internal state precisely characterize an actor, or simply might misperceive a social cue. Hence, in complex environments (high λ e ) or when social stimuli are noisy and inaccurate (low p), using information acquired solely through social interactions may not suffice, requiring a large number of learning instances. Similar results are obtained when individuals can switch from S to p over their lifespan depending on the available information (Appendix).

The effect of behavioral complexity is manipulated by varying the learning rate, that we assume to be inversely proportional to the number of stimuli to be learned. As behavioral complexity increases, S strategies are favored (Fig. 2b).

We then explored the effect of inter-individual differences between individuals, that could arise because of differences in genetics, social status, or simply experience between the observer and the actor. These differences are detrimental to S strategies, since an S -observer infers the behavior of other individuals from its own, possibly extrapolating incorrect inferences if the two individuals differ in their responses to a given environmental stimulus, i.e. their own environmental state. We implemented these differences by using a probability ψ that individuals differ in the response to a particular as-actor stimulus (i.e. two different individuals would respond to a stimulus with two different optimal actions). S-observers have the possibility to make correct inferences only when the focal observer and the observed actor share the same behavioral response, and fail in a fraction ψ of the interactions. Thus, inter-individual differences impact negatively the fitness of S-strategies, reducing the range of parameters for which they would evolve (Fig. 2c). However, S -strategies still dominate in variable environments, and when p is low (Fig. 2c). This occurs because inter-individual differences do not only affect the fitness of S strategies, but also that of P strategies. In fact, only a fraction 1 − ψ of the individuals share the same response, and therefore the probability of encountering a specific stimulus-action is reduced by a factor 1 − ψ, requiring longer times to learn the behavior of the different individuals and mimicking the effects of environmental or behavioral complexity.

Note that all these results can be easily extended to cases in which individuals interact preferentially with some other individuals in a population, because of group structure, relatedness, or social status and dominance relationships. For example, individuals might interact more frequently with individuals with similar behavior. In this case, inter-individual differences will be partially masked, and exert a reduced effect on S -strategies, that in turn would be favored (Fig. 2d).

Concluding, as environmental complexity increases and self information becomes valuable, S strategies are able to invade even if this causes more coordination. In our simplified model, an upper bound for the probability of coordination is:

$$\frac{{d}^{-}}{{d}^{-}-{c}^{-}}\frac{{\lambda }_{d}+{\lambda }_{e}}{{\lambda }_{d}+{\lambda }_{e}+{p}^{-}{\lambda }_{l}} > {P}_{C}\,\mathrm{.}$$ (8)

Accidental coordination is sustained at the evolutionary equilibrium

In our model coordination is costly for the observer. Eq. 8 establishes an upper bound to the amount of coordination that can be tolerated by a simulative strategy. We can now ask under what conditions will accidental coordination be present (P C > 0) by looking at how inhibition evolves.

With efficient inhibition, an S -observer will be able to extract information from its as-actor network, without generating the response usually exhibited as an actor, here C. These cases can be seen as True-Negatives, as inhibition is applied correctly to avoid a costly and inappropriate as-actor response (here C, see Fig. 3a). This inhibition of the as-actor network can be achieved in different ways, that here we can categorize on the basis the types of errors that they might incur on, i.e. false positives and false negatives, and thus facing different evolutionary trade-offs. A first mechanism is to modulate how the as-actor network is recruited as an observer, potentially affecting the accuracy of social inferences: the more the as-actor network is allowed to take over, the more accurate the simulation; however, the chance that an as-actor action is triggered increases, potentially leading also to an increased risk of coordination (P C , false positives, see Fig. 3a,c) together with accurate inferences (P D , true negatives). A second mechanism is to modify the structure of the as-actor network itself to make it easier to inhibit; however this would potentially lead to accidental inhibition of actions when the individual is an actor (false negatives, see Fig. 3a,b), decreasing P B as well as P C .

Figure 3 (a) Trade-offs constraining the recruitment of the as-actor network by simulative strategies. The as-actor network can be activated (left) or inhibited (right). For an actor (top), the inhibition of the as-actor network prevents an appropriate response B (False Negative), while if activated B can be performed (True Positive). When the individual is no actor but an observer (bottom), the activation of the as-actor network leads the observer to respond as an actor (C, False Positive). (b) Direction of the selection gradient for a resident population with traits u D , the intensity of simulation (x-axis), and u B , the inhibition threshold (y-axis). (c) Frequencies of D (P D , blue) and C (P C , red) responses, for a fixed value of \({u}_{B}={u}_{B}^{\ast }\). In this case u D evolves maximizing the as-observer fitness. (d) P D , (blue) and P C (red) and P B (black) for a fixed value \({u}_{D}={u}_{D}^{\ast }\). A singular strategy occur when the sum of the as-actor component (proportional to P B ) and as-observer component (a payoff weighted sum of P D and P C ) of the fitness is highest. Full size image

In the following, we model these two general inhibition mechanisms as two one-dimensional continuous traits, u D and u B respectively. Of these, u D can be seen as the intensity of the recruitment of the as-actor network in the as-observer context, or the extent of shared representation. The second, u B , can be seen as an activation threshold, inhibiting as-actor responses when the activation of the as-actor network is below the threshold. In the discussion we address the generality of results from this simple model.

When u B is fixed, P B is identical for all individuals, thus the system is frequency independent. In this case, the fitness of a mutant strategy S with u D value \({u}_{D}^{m}\) is independent of the fitness of the other S strategies present in the population. Therefore, a single S strategy dominates and reaches fixation, leading to an evolutionary equilibrium value \({u}_{D}^{\ast }\), for which the invasion fitness \(\frac{d{\pi }_{-}^{S}({u}_{D}^{\ast })}{d{u}_{D}}\) equals 0. This is the global maximum of the function \({\pi }_{-}^{S}({u}_{D})\), and is both an evolutionarily stable strategy (ESS) and convergence stable (CS).

We see that in most cases this equilibrium is internal, i.e. u D > 0, leading to a small level of accidental coordination (0.5 − 2% of the interactions in the examples tested), even though inhibition is allowed to evolve and other strategies to be used (Fig. 3b–d). We perform a general evolutionary invasion analysis in section 2 of the Appendix, where we identify the conditions required for the evolution of an internal equilibrium with accidental coordination. However, to explain intuitively why this occurs, we obtain here analytical expressions for \({u}_{D}^{\ast }\) by assuming specific expressions for the cognitive functions \(\alpha ({u}_{D},\,x)\) and \(l({u}_{D},\,x,\,t)\) in Eq. 4. In particular, we explore the simplified case of stimuli with identical intensities equal to 1, with step-linear dependencies on the representation intensities for the activation/inhibition and the learning function:

$$l=\{\begin{array}{ll}x & 0\le x\le 1\\ 1 & x > 1\end{array},\,\alpha (x,{u}_{B})=\{\begin{array}{ll}0 & 0\le x\le {u}_{B}\\ {\rho }_{a}(x-{u}_{B}) & {u}_{B}\le x\le {u}_{B}+\mathrm{1/}{\rho }_{a}\\ 1 & x > {u}_{B}+\mathrm{1/}{\rho }_{a}\end{array}$$ (9)

In this simplified case, a full inhibition of accidental coordination would occur if \({u}_{D}^{\ast }\le {u}_{B}\), whereas accidental coordination occurs if \({u}_{D}^{\ast } > {u}_{B}\), the higher u D the more coordination. When u D < u B evolution always leads to higher values of u D , whereas when \({u}_{D}\le \mathrm{(1}+{u}_{B}{\rho }_{a})/{\rho }_{a}\) it leads to lower values, i.e. the invasion fitness is respectively always positive \((\frac{d{\pi }_{-}^{S}({u}_{D})}{d{u}_{D}}={\gamma }_{a}\,{d}^{-})\) or always negative \((\frac{d{\pi }_{-}^{S}({u}_{D})}{d{u}_{D}}=-\,{c}^{-})\). For intermediate values \(({u}_{B}\le {u}_{D}\le (1+{u}_{B}{\rho }_{a})/{\rho }_{a})\) instead, a singular point satisfies:

$$0=-\,2{\rho }_{a}({\gamma }_{a}\,{d}^{-}+{c}^{-}){u}_{D}^{\ast }+{u}_{B}\,{\rho }_{a}({\gamma }_{a}\,{d}^{-}+{c}^{-})+{\gamma }_{a}\,{d}^{-}.$$ (10)

Thus, \({u}_{D}\ast \) the evolves towards the maximum of the as-observer fitness:

$${u}_{D}^{\ast }=\,{\rm{\max }}({u}_{B},\frac{1}{2}({u}_{B}+\frac{{\gamma }_{a}}{{\rho }_{a}}\frac{b}{b+c})).$$ (11)

Since accidental coordination evolves when \({u}_{D}\ast > {u}_{B}\), this indicates that accidental coordination can evolve even if costly for the observer, provided that \({\rho }_{a} < \frac{{\gamma }_{a}}{{u}_{B}}\frac{b}{b+c}\). This condition indicates that coordination can be fully inhibited only if 1/ρ a is high enough, i.e. the slope of the inhibition function is very steep and discriminative (Appendix, Fig. S1). Thus, this simplified case shows that evolution will lead to higher values of u D - and in turn accidental coordination to increase - as long as the benefit of more accurate inferences outweighs the cost of a higher risk of accidental coordination events, that here increases proportionally to 1/ρ a at increasing values u D (Fig. 3c).

Remarkably, accidental coordination can evolve even when also the activation threshold can evolve, alone (Appendix, section 2.2) or simultaneously with u D (Appendix, section 2.3). To see why, note that whenever u B evolves to values higher than \(1-{\rho }_{a}{u}_{D}\), also actions performed as an actor can be inhibited (Fig. 3d). This behavior is a specific example of a more general trade-off between true negatives and false positives: increasing the strength of inhibitory mechanisms can result in the unspecific inhibition of actions that could be advantageous for the observer. In particular, in the simplified case, the equilibrium value of u B , i.e. how selective the activation threshold is, decreases proportionally to (1 − p)/p, the odds of a potential false negative (Appendix section 2.2).

An important biological note is that this simplified model is likely conservative. First, also other as-observer actions could risk of being inhibited. Second, in real life scenarios, stimuli and the activation of neurophysiological pathways are also subjected to noise and variability. This uncertainty increases the chance of false negatives and false positives, and thus likely lead to scenarios that could be represented with smoother, more continuous activation functions, compared to the simplified model presented above. In these cases - that can be represented mathematically with a continuous activation function (e.g. a sigmoid) or normally distributed stimuli intensities - a small probability of accidental coordination always evolve (Appendix, section 2).

Finally, note that these results are not dependent on the specific functions used for the simplified model, and they reflect general payoffs underlying the trade-off between increasing the amount of true positives and negatives (appropriate responses B and D, respectively), and the risk of false poositives (C) and false negatives (unspecific inhibition of other actions, \(\varnothing \)). Thus, we show a generalization of these results in section 2.1–2.3 of the Appendix. Furthermore, we show that these results hold even for more complex models of inhibition and social responses (section 2.4 and 4 of the Appendix).

Kin selection and indirect benefits of coordination

We also investigated the interaction between simulative strategies and kin selection. To this aim, we modeled the indirect fitness benefits of kin selection by adopting a payoff structure similar as in Taylor and Nowak22, using r, an abstract assortment coefficient that can represent either relatedness or group structure22. We focus on the evolution of u D , the intensity of the recruitment of the as-actor network. The transformed payoff structure is shown in Table 3.

Table 3 Payoff matrix in presence of relatedness. Full size table

Notice that as r changes, the optimal strategy for the observer, which we called D, could change. For simplicity we first ignore this effect, assuming that D does not change with r.

Empirical evidence suggests that empathy, contagious yawning and emotional contagion are stronger with kin or in-group members, a phenomenon defined as empathic gradient. In our model, this would correspond to higher levels of recruitment of the as-actor network (u D ) and coordination (P C ) at increasing r. Therefore we investigate the effects of r on the recruitment of the as-actor network, u D . Our purpose is to estimate the effect of r on the evolutionary equilibrium for \({u}_{D}^{\ast }\), hence \(\frac{d{u}_{D}^{\ast }}{dr}\).

This can be obtained by implicit differentiation of the invasion fitness23 (see Appendix). We show that the conditions necessary to an empathic gradient analogous to what observed empirically are that c+/c− > d+/d−. Note that the effect of relatedness on coordination is non-trivial: r can either increase (Fig. 4, high values of c+/c−) or even decrease u D and P C (Fig. 4, low values of c+/c−). An increase is observed provided that a more efficient mind-reading, i.e. an higher proportion of D responses, provides a substantial benefit, either direct and aimed at defection (d−) or cooperation (rd+). Hence a benefit of mind-reading might explain observed behavioral patterns of emotional contagion and empathy. The patterns described above are even stronger when γ a is small (Appendix).

Figure 4 Effects of relatedness (x-axis) on the intensity of simulation u D (level curves, top values) and on the probability of coordination (bottom values, in brackets), at varying payoffs. Payoffs are changed by varying the ratio of coordination payoff for actor and observer \({c}^{+}/{c}^{-}\), when \({d}^{+}/{d}^{-}\) is constant. Level curves indicate regions with different values of u D for the internal equilibrium (inhibition), when this exists. The shaded area indicates regions of the parameter space where full coordination (P C = 1) is evolutionarily stable. This can overlap regions of the parameter space where an internal equilibrium is also stable (inhibition + full inhibition), or not (full coordination, top right corner, in dark gray). In this plot we considered a sigmoid activation function and \({d}^{-}={d}^{+}\mathrm{=5}\) and \({\gamma }_{a}\mathrm{=1}\). Full size image