We identify and explain the mechanisms that account for the emergence of fairness preferences and altruistic punishment in voluntary contribution mechanisms by combining an evolutionary perspective together with an expected utility model. We aim at filling a gap between the literature on the theory of evolution applied to cooperation and punishment, and the empirical findings from experimental economics. The approach is motivated by previous findings on other-regarding behavior, the co-evolution of culture, genes and social norms, as well as bounded rationality. Our first result reveals the emergence of two distinct evolutionary regimes that force agents to converge either to a defection state or to a state of coordination, depending on the predominant set of self- or other-regarding preferences. Our second result indicates that subjects in laboratory experiments of public goods games with punishment coordinate and punish defectors as a result of an aversion against disadvantageous inequitable outcomes. Our third finding identifies disadvantageous inequity aversion as evolutionary dominant and stable in a heterogeneous population of agents endowed initially only with purely self-regarding preferences. We validate our model using previously obtained results from three independently conducted experiments of public goods games with punishment.

Copyright: © 2013 Hetzer, Sornette. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

The following section 0 describes the model in detail and explains the interplay of agents who maximize their expected utility under the effects of natural selection and competitive evolutionary dynamics. Then, section 0.0 presents empirical tests of the theory. Section 0.0 establishes the evolutionary dominance of the specific other-regarding preference in the form of disadvantageous inequity aversion. Section 0.0 concludes.

The analysis of our expected utility model, in combination with the underlying evolutionary dynamics, allows us to identify and explain the origin and the emergence of other-regarding preferences and, ultimately, enables us to quantitatively explain the degree of altruistic punishment that is observed in lab experiments. As a result, our approach complements and extends other utility frameworks, e.g. the Fehr Schmidt model [1] , Bolton/Ockenfels [2] and Rabin [3] , as well as existing evolutionary models linked to economics [4] – [10] , by combining both perspectives in order to explain prosocial behavior. Unlike other approaches, our model does not assume ex ante the existence of other-regarding preferences, but instead demonstrates their co-evolutionary emergence along with the emergence of altruistic punishment behavior. The design of our model is inspired by previous findings about the co-evolution of culture, norms and genes, the effect of other-regarding behavior as well as bounded rationality. We motivate our model by the psychological predisposition of individuals to maximize their expected utility together with subliminal disposition to follow social norms [11] – [15] . Both mechanisms are closely related in the process of gene-culture co-evolution.

Why do we maintain moral attitudes, display other-regarding behavior, have a distaste for unfairness, act prosocially and, at times, even behave altruistically towards others? How is this behavior compatible with the predominant theories of rational choice, selfish utility maximization and, in particular, with Darwin's principle of the survival of the fittest? This article presents an evolutionary utility framework of fairness, altruistic punishment and cooperation. It develops quantitative arguments supporting the hypothesis that the key to understanding the ostensibly mysterious patterns of human behavior is deeply rooted in our evolutionary history.

The Model

1 General framework We take an evolutionary approach as a starting point to construct our model. The fitness of an agent is considered to be equivalent to her realized cumulative payoff relative to that of other agents, i.e. to the monetary units (MU) that the agent gains over time relative to the average over all other agents. Each agent is characterized by one or multiple traits. The traits of an agent determine her behavior and correspond to a pure strategy denoted by . Traits are passed on as fitness weighted values to the offspring in the process of evolutionary reproduction. The population is determined by the set of pure strategies . In an evolutionary competitive environment, agents are subject to natural selection which affects their viability and fertility. While viability selection accounts for removing poor performing agents from the population, fertility selection enables more successful agents to spread and to promote their genetic and cultural heritage in the population. This process corresponds to the standard evolutionary challenge of survival and reproduction. Following the Darwinian principle of the survival of the fittest, both selection mechanisms are defined relative to the environment of an agent. This means that the fitness of an agent is determined relative to the performance of the remaining population that she is exposed to and interacts with. In an evolutionary environment, the success of an agent and of her strategies defines the fitness of the agent and thus determines the proportional change of the strategies (traits) in the population over time. The set of strategies that characterizes a population of agents is specified by a probability measure that quantifies the frequencies of the single strategies in the population at time . In the two player case, the payoff of an agent who plays a pure strategy against another agent who plays the pure strategy is denoted by . Both, and are defined in the continuous strategy space ( depends on the game; in the public good game studied here, corresponds to the decision on the contribution level and on the chosen propensity to punish for each agent). For the -player case, the average payoff of an agent who plays a strategy at time against a population characterized by the probability measure over the strategy space is defined by (1)The total average payoff of the entire population at time is defined by (2)The success of a strategy is given by the difference of equations (1) and (2) as shown e.g. in [16]–[18]: (3)The dynamics of the frequency of a specific strategy in the population is defined by the ordinary differential equation (4)The function in equations (1–3) reflects the underlying payoff structure of the analyzed game. In the context of this paper, represents the payoff function of a public goods game with punishment.

2 The public good games with punishment In the following, we model the behavior of agents playing a standard one-shot-interaction public goods game with punishment as presented in [19]–[21]. Agents are pooled in groups of size . Each agent is characterized by a strategy that is defined by two traits. The first trait corresponds to the amount of MUs an agent contributes to the common group project (the public good) and thus reflects the agent's willingness to cooperate. The second trait reflects the agent's propensity to punish defectors in the group. The general procedure of the public goods game with punishment is as follows. In the first stage of the game, agent contributes monetary units (MUs) to a common public good which yields a return of MUs per invested MU. The return from the public good is equally redistributed among the group members. Agents then learn about the contributions of the other group members. In a second stage, they are provided with the opportunity to punish other group members. Punishment comes in the form of additional costs for both the punisher as well as the punished agent: for each MU spent by the punisher, the return that the punished agent obtained from the public goods game is reduced by MUs. Given the one-shot-interaction characteristic of the game, punishment does not result in a direct or indirect material benefit and is often considered in the literature to be an altruistic act.

3 Modeling assumptions We make the following assumptions about the behavior of agents and the evolutionary environment: Agents are assumed to be self-interested and to act rationally given their available information and computational capabilities [22]–[25]. In particular, agents are involved in one-shot interactions only and have no ex-ante information about the others' actions at the time they take their decisions. Thus, other agents' past actions (history) will not affect the agent's current or future decisions. This corresponds to the so-called stranger treatment in the experiments that we analyze below. This can also be called the strong mixing regime with rapid memory loss. A perhaps more convincing interpretation of this framework is in terms of a coarse-grained description of the multiple interactions and feedbacks between agents within groups that act over time scale of generations. In this interpretation, the unit time step is roughly commensurate with the agent lifetime. Then, other agents' past actions and history occur essentially within each time step but not beyond, justifying the model assumption.

Agent [19]–[21], [26], [27]. Figure 1 illustrates this behavioral pattern for data obtained in three public goods games [19]–[21].

[19]–[21], [26], [27]. Figure 1 illustrates this behavioral pattern for data obtained in three public goods games [19]–[21]. The factor [28]–[30]. The subjects' psychological predispositions to render these encoded norms effective ultimately results in the focal action that is observed as a direct and immediate harm towards negative deviators or it acts as a hidden deterrence [11]. Today, lab experiments and field studies such as those of [19]–[21], [31], [32] allow one to sample and observe the statistically stationary characteristics of the common propensity to punish

[28]–[30]. The subjects' psychological predispositions to render these encoded norms effective ultimately results in the focal action that is observed as a direct and immediate harm towards negative deviators or it acts as a hidden deterrence [11]. Today, lab experiments and field studies such as those of [19]–[21], [31], [32] allow one to sample and observe the statistically stationary characteristics of the common propensity to punish The population of agents is subject to evolutionary dynamics in the form of selection, cross-over and mutation. These three mechanisms affect the viability and fertility of an agent. Viability selection induces a minimal survival condition in the form of a fixed lower value of consumption PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 1. Mean expenditure of a given punishing member as a function of the deviation between her contribution and that of the punished member, for all pairs of subjects within a group, as reported empirically [21]. Mean expenditure of a given punishing member as a function of the deviation between her contribution and that of the punished member, for all pairs of subjects within a group, as reported empirically [19] The error bars indicate the standard error around the mean. The straight line crossing zero with a slope of shows the average decision rule for punishment. The anomalous punishment of cooperators, corresponding to the positive range along the horizontal axis, is neglected in our model. The inset shows the relative frequency of the pairwise deviations. https://doi.org/10.1371/journal.pone.0077041.g001

4 Utility formulation of the public goods game model We first formulate a fitness model assuming complete information. The profit and loss (P&L) of an agent that plays a public goods game with punishment is determined by the payoff from the game minus the costs of punishing and being punished and minus the contributed effort: (5)The first term in the right hand side of equation (5), i.e. , corresponds to the contribution of agent to the public good. The second term represents the return from the public good. The third and fourth terms quantify the costs of being punished by others and punishing others, respectively. The number of agents in the group is denoted by , the return from the public good is per invested MU, and corresponds to the punishment efficiency factor. Analogously, the P&L of the remaining agents can be written as (6) By writing the relative fitness of an agent according to equation (3) in the form of an evolutionary measure of success, we obtain the fitness of agent as the sum of the experienced payoff differences between the own monetary payoff and the monetary payoff of the remaining individual group members : (7)The relative fitness of an agent is thus obtained by putting the payoff of agent in relation to the payoff of the remaining population. This form of the fitness describes a population of agents that is exposed to evolutionary dynamics as presented in section 0.1: Positive values of are desirable, because they are associated with a higher fertility and a lower mortality. Negative values of should be avoided in order to prevent the evolutionary extinction of the own traits. In our attempt to combine preferences, expected utility and evolution, we should be careful and clarify what is meant by fitness and by utility. It is customary to distinguish fitness from utility, the former being proportional to material payoffs as defined above (eventually controlling the number and quality of offsprings), while utility refers to preferences. In full generality, it may be that preferences differ from payoffs, as when utility depends on the presence of salient unchosen alternatives that do not have any impact on the payoffs [9]. This distinction may lead to a fruitful strategy to understand how evolution processes may lead to apparent dysfunctional preferences [8] and contribute to explain the nature of our utility functions [9]. However, since our purpose is limited to provide a generalization of utility approaches of [1], [2] and [3] to explain the experimental results of three independently conducted experiments of public goods games with punishment, we shall neglect the difference between fitness and preference. For our purpose, agents in our game as well as in the experiments are not exposed to salient unchosen alternatives, suggesting that we can neglect the distinction between fitness and utility within our framework. The issue of fitness versus utility falls within the bigger question of the so-called “adaptationist program”, which has dominated evolutionary thinking, and its possible caveats have been insightfully presented by [33]. Henceforth, by substituting equations (5) and (6) into equation (7), we obtain what can be termed the evolutionary utility of an agent, given by the two-term function shown in equation (8) below. The first term of (8) corresponds to agent 's utility gained from the payoff of the public goods game with punishment. The second term of equation (8) represents the payoff for each of the opponents indexed by . The total utility for agent is defined by the sum of the differences between all combinations of and with : (8) Consistent with utility theory (even in the presence of bounded rationality) and the underlying evolutionary dynamics, we assume that the agents seek to maximize their utility [22]. Obviously, the maximum of the utility function (8) can only be calculated in the hypothetical case of complete information about the others' contributions. However, information about the individual contributions is not available ex ante, because agents decide about their contributions simultaneously. It follows that agents are required to make assumptions, i.e. to form their first-order beliefs, about the others' contributions. We model this by transforming the utility model in equation (8) into a subjective expected utility model. Therefore, we introduce the subjective probability measure that represents agent 's (first-order) belief about the contributions of the other agents. quantifies the likelihood as perceived by agent that another agent will contribute MUs. In the one-shot game version studied here, all agents are indistinguishable from the point of view of an agent , i.e., agent has no information on any preference, trait or specific characteristics of the other agents. Using , agent can form her expectation [34] about the average of the other agents' contributions: (9)Similarly to the propensity to punish, can be interpreted as the expected norm-conforming behavior of the population that has co-evolved, learned and internalized across time in a population of interacting agents. The utility model defined in equation (8) is transformed into an expected utility model using the subjective expectations . Rewriting and by replacing each value with agent 's subjective expectation on gives the following equations: (10) (11)Note that, in the formation of the expectation by agent of the others' utility functions, agent 's own contribution is obviously known to her, hence the term appears without averaging. As in the case of complete information, agents seek to maximize their relative fitness, i.e. the sum of the differences between their own P&L, , and the others' P&L. Putting all this together, we obtain the evolutionary expected utility function of agent as shown in equation (12). (12) We start our analysis by a classical utility optimization problem. Agents maximize with respect to their contribution : (13)The first order condition of problem (13) reads (14)with (15) The second-order condition for a local maximum of (13) holds for any reasonable assignment of the problem parameters, i.e. The cumulative distribution function of the contributions of the other agents, as anticipated by agent , is defined by . The term in equation (15) corresponds to the survival function of the subjective expected distribution of contributions in the population: (16)Substituting as defined in equation (16) into equation (15) yields: (17)Equation (17) describes a functional relation between the predetermined parameters of the public goods game, i.e. the group size , the project return factor and the punishment efficiency , as well as the variable traits of agent , i.e. the propensity to punish and her subjective expectation (first-order belief) about the fraction of her group fellows who contribute more than her own contribution . As we are interested in the agents' evolutionary optimal punishment behavior, we solve equation (17) for and obtain: (18) depends on via agent 's subjective (first-order) belief embodied in that the other agents will contribute more than herself. Thus, can be interpreted as the value that makes agent better off not to deviate negatively or positively from her willingness to contribute MUs to the public good, given she believes a number of of other group fellows contribute more than her own contribution . In the following subsection, we consider evolutionary dynamics in our model.

5 Evolutionary dynamics of the level of cooperation The evolutionary dynamics of agents, who face a social dilemma situation in the form of a public goods game with punishment, can be described in terms of the agents' P&L as a function of the deviation in the contribution level and of the population's common propensity to punish . If agent starts to deviate from her current level of cooperation by a value of , the absolute change of the P&L for the agent as a function of and is defined as follows: (19)These expressions assume that the deviation affects a single agent, who deviates from the norm. Hence, the agent is punished by (resp. punishes) all other agents for (resp. ). The deviation of agent by affects not only her own P&L, but also the P&L of the remaining agents . The absolute change of the P&L of the remaining population as a function of and reads (20) Putting equations (19) and (20) together with (21)yields the relative change of the P&L of agent with respect to the remaining population: (22) The form of equation (22) is equivalent to the relative measure of success introduced in equations (3) and (7) with . As introduced above, the realized P&L from the public goods game with punishment can be interpreted as the fitness of an agent in an evolutionary environment. The fitness, in turn, is associated with the rate of fertility, i.e. the fitter an agent becomes, the more genetically related offsprings she produces. In this way, traits of agents with a higher realized P&L value tend to spread and to end up dominating the population over time. It thus holds that the traits in the population move with time towards values of a subpopulation that on average achieves a higher mean P&L than the average mean P&L of the entire population. The corresponding replicator dynamics are (23)with and being respectively the proportion of agents deviating by and with a propensity to punish . Extending the previous reasoning, expressions (23) now include the occurrence of deviations from the optimal (for the community) propensity to punish, which may result for instance from random mutations. The dynamics for the expected group average, and , are accordingly defined by (24) The sensitivity of given by (22) with respect to the relative change of is defined by the partial derivative (25) With the conditions that and , i.e. a game has always two or more players and punishment is less costly to the punisher than to the punished agent, it holds that, for , is always negative and, for , is always positive: (26)This reveals the existence of two distinct evolutionary regimes that are separated by the bifurcation point at (27) Defection: For (28)

Coordination: For [1]. If punishment is efficient, the utility maximizing strategy is to contribute according to the expected contribution of the remaining group fellows, i.e. to contribute according to the first-order belief. Following Black's theorem, the best estimate for this strategy is the median value [12], [35]–[37]. The median value (29) (30) Figure 2 depicts the structure of equation (22) with a punishment efficiency factor and a group size for (black, dashed), (grey) and (grey, dashed). PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 2. Sensitivity of Sensitivity ofas a function of a relative changeof the contributions for a group size of, a punishment efficiencyand a propensity to punish of(black, dashed),(grey) and(grey dashed). https://doi.org/10.1371/journal.pone.0077041.g002 The next subsection analyzes the identified evolutionary stable strategies (ESSs) for a population of agents that is either purely self-regarding and acting selfishly or a population of agents that incorporates other-regarding preferences in their decision process.