We begin by laying out the key assumptions underlying both the analytical and simulation models. The predictions of the analytical model are derived using adaptive dynamics. We present the key insights that we are able to derive without the complexities of simulation. The full analytical model can be found in the Supplementary Material. We then build on the analytic solutions to fully explore the mechanisms underlying these insights using an agent-based evolutionary simulation. This simulation also allows us to explicitly track group size and relax some of our assumptions, allowing oblique learning, learning biases, and life history to evolve.

We present the key insights and predictions of our model in three ways. First, we explain the conditions under which we expect relationships between our variables and how the size of these relationships is affected by our parameters. In doing so, we verbally describe the core logic underlying the theory. Second, we compare our predictions to existing data, plotting our simulation results side-by-side with this existing data. If our predictions were inconsistent with existing empirical correlations, this would pose a significant challenge to our theory. Finally, we derive the Cumulative Cultural Brain Hypothesis predictions, laying out the narrow evolutionary regime under which an autocatalytic interaction between cultural and genetic inheritance is most likely to generate a human-like take-off.

The logic that follows from these key assumptions is first formalized using an analytic approach—an adaptive dynamics evolutionary model [ 47 ], available in the Supplementary Materials. This model captures the logic and several of the key predictions of the CBH. We then simulate the logic to capture the co-evolutionary dynamics needed to generate the CCBH.

The analytical adaptive dynamics model we present in the Supplementary Materials allows us to understand the evolution of brain size, adaptive knowledge, and reliance on social learning as a function of transmission fidelity, asocial learning efficacy, and survival returns on adaptive knowledge without the complexities of co-evolutionary dynamics and explicit evolution of oblique learning and learning biases. We can derive a set of predictions from the insights gained from this model.

Brain size and reliance on social over asocial learning will depend on factors that affect availability of adaptive knowledge, which are themselves affected by learning strategies and adaptive knowledge. In other words, there are a range of co-evolutionary dynamics that we have assumed or abstracted away in order to solve this model analytically, but which are crucial to capture and understand the full range of evolutionary dynamics. To understand the conditions under which social learning might emerge (and perhaps more interestingly, extreme reliance on social learning as in humans), we need to explore these co-evolutionary dynamics. We explore these full set of variables and explore these dynamics through an evolutionary simulation. An evolutionary simulation also allows us to properly account for population size, population structure, more sophisticated learning strategies, and life history. This model will bolster and expand on our analytic model and reveal the conditions where adaptive knowledge and brain size will increase.

Simulation model

To explore the culture-gene co-evolutionary dynamics, we constructed an agent-based evolutionary simulation that extends our analytic model. In our simulation, individuals are born, learn asocially or socially from their parent with some probability, potentially update by asocial learning or by socially learning from more successful members of their group during an extended juvenile period, migrate between demes, and die or survive based on their brain size and adaptive knowledge. Individuals who survive this process give birth to the next generation. We are mainly interested in the effects of natural selection and learning, so we use a haploid model and ignore non-selective forces such as sex, gene recombination, epistasis, and dominance. The lifecycle of the model, as well as all variables and parameters, are shown in Fig 1 below.

This simulation was written in C++ by MM (code in Supplemental Materials). To reduce bugs, two computer science undergraduate research assistants independently reviewed the code and wrote a suite of unit tests using Google’s C++ Testing Framework. The simulation begins with 50 demes, each with a population of 10 individuals. Throughout the simulation, the number of demes was fixed at 50. In early iterations of the model, we explored increasing the number of demes to 100 for some of the parameter space and found no significant impact on the results. Our starting population of 10 individuals is roughly equivalent to a real population of 40 individuals, assuming two sexes and one offspring per parent (4 × 10). As a reference, mean group size in modern primates ranges from 1 to 70 [32].

Each individual i in deme j has a brain of size b ij with a fitness cost that increases with increasing brain size. Adaptive knowledge is represented by a ij , where 0 ≤ a ij ≤ b ij . Increasing adaptive knowledge can mitigate the selection cost of a larger brain, but such knowledge is limited by brain size.

Our simulations begin with individuals who have no adaptive knowledge, but the ability to fill their b ij = 1.0 sized brains with adaptive knowledge through asocial and/or social learning with some probability. To explore the idea that juvenile periods can be extended to lengthen the time permitted for learning, we included two stages of learning. In both learning stages, the probability of using social learning rather than asocial learning is determined by an evolving social learning probability variable (s ij ). We began our simulations with the social learning probability variable set to zero (i.e. at the beginning of the simulation, all individuals are asocial learners). To explore the invasion of asocial learners into a world of social learners, we also ran the simulation with the social learning probability variable set to one (i.e. at the beginning of the simulation, all individuals are social learners). Although social learning is widespread in the animal kingdom [22], a realistic starting point is closer to pure asocial learning. Nevertheless, the simulations starting with social learners were often useful in understanding these dynamics, so, in some cases where it is insightful, we report these results as well.

Asocial learning allows for the acquisition of adaptive knowledge, independent of the adaptive knowledge possessed by other individuals. In contrast, social learning allows for vertical acquisition of adaptive knowledge possessed by the genetic parent in the first learning stage or oblique acquisition from more knowledgeable members of the deme (from the parental generation) in the second learning stage. The tendency to learn from models other than the genetic parent is determined by a genetically evolving oblique learning probability variable (v ij ). Thus, the simulation does not assume oblique learning or a second stage of learning [a misplaced critique of related models in our opinion; 50; but a critique not relevant to the present model, 51]. The probability of engaging in a second round of oblique social learning is a proxy for the length of the juvenile period. In the second stage of learning, if an individual tries to use social learning, but does not use oblique learning, no learning takes place beyond the first stage. This creates an initial advantage for asocial learning and cost for evolution to extend learning into an extended juvenile period. We also allow the ability to select a model with more adaptive knowledge (for oblique learning) to evolve through a payoff-bias ability variable (l ij ).

These simulations result in a series of predicted relationships between brain size, group size, adaptive knowledge, asocial/social learning, mating structure, and the juvenile period. Some of these relationships have already been measured in the empirical literature and thus provide immediate tests of our theory. Specifically, several authors have shown positive relationships (notably in primates) between (1) brain size and social group size [44, 31, 52], (2) brain size and social learning [46, 53], (3) brain size and length of juvenile period [54–57], and (4) group size and the length of the juvenile period [56].

Various hypotheses have been proposed for these relationships. Here we argue that they are all a consequence of a singular evolutionary process, the dynamics of which the CBH models reveal. In addition, we find that different rates of evolutionary change and the size of these relationships across taxa [6] may be accounted for by the extent to which adaptive knowledge reduces the death rate (λ in our model). This λ term captures any factor that moderates the relationship between adaptive knowledge and survival. One interpretation, but by no means the only one, is the resource richness of the ecology. For example, richer ecologies offer more ‘bang for the buck’, that is, more calories unlocked for less knowledge, allowing individuals to better offset the size of their brains. Higher λ suggest a richer ecology—or more specifically, an ecology where smarts have a greater return on survival. Indeed, research among primates has revealed that factors affecting access to a richer ecology—home range size or the diversity of food sources—are associated with brain size [58, 59]. Thus, our model may help explain why both social and ecological variables seem to be variously linked to brain size.

The dynamics of our model also reveal the ecological conditions, social organization and evolved psychology most likely to lead to the realm of cumulative cultural evolution, the pathway to modern humans. These predictions capture the CCBH. Our model indicates the following pathway. Under some conditions, brains will expand to improve asocial learning and thereby create more adaptive knowledge. This pool of adaptive knowledge leads to selection favoring an immense reliance on social learning, with selective oblique transmission, allowing individuals to exploit this pool of growing knowledge. Rogers’ [60] paradox, whereby social learners benefit from exploiting asocial learners’ knowledge, but do not themselves generate adaptive knowledge, is solved by selective oblique social learning transmitting accidental innovations to the next generation. Under some conditions, an interaction between brain size, adaptive knowledge, and sociality (deme size and interconnectedness) emerges, creating an autocatalytic feedback loop that drives all three—the beginning of cumulative cultural evolution.

The lifecycle. Individuals go through four distinct life stages (see Fig 1): Individuals (1) are born with genetic traits similar to their parents, with some mutation, (2A) learn adaptive knowledge socially from their parents or through asocial learning independent of their parents, (2B) go through a second stage of learning adaptive knowledge through asocial learning or oblique social learning, (3) migrate between demes, and (4) die or survive to reproduce the next generation. Fecundity and viability selection (birth and death) are expressed separately, allowing us to disentangle the effect of adaptive knowledge on outcompeting conspecifics and on reducing the risk of dying before reproduction.

Stage 1: The birth stage. In the birth stage, the individuals who survive the selection stage (Stage 4) give birth to the next generation. Adaptive knowledge and the number of offspring. We assume that demes with greater mean adaptive knowledge can sustain a larger population. We formalized this assumption in Eq 1 by linking k j , which affects the carrying capacity of the deme, to the mean adaptive knowledge of the individuals in the deme (A j ) and some minimum value that we set to our starting group size ( ). The relationship between mean adaptive knowledge and k j is scaled by χ, but adjusting this coefficient resulted in a computationally intractable deme size as adaptive knowledge accumulated. Therefore, we set this coefficient to a constant value (χ = 10) and left exploration of this parameter for a future model. The deme size in the current generation (t) and k j are then used to calculate the total expected number of offspring ( ) in the next generation (t + 1) using the discrete logistic growth function in Eq 2, where ρ is the generational growth rate. Initial simulations suggested that ρ only affected the rate of evolution rather than the qualitative outcomes. We selected a reasonable value (ρ = 0.8) based on Pianka [61]. (1) (2) Eq 2 tells us the Expected Value for the number of offspring based on current deme size and k j (based on deme mean adaptive knowledge). However, this does not tell us which individuals within the deme gave birth to the offspring. We assume that more adaptive knowledge increases an individual’s birth rate. We parameterized the strength of the relationship between adaptive knowledge and birth rate (fecundity selection). A potential parent’s (i j ) probability of giving birth (p ij ) is given by their sigmoid transformed adaptive knowledge value (Eq 3) as a fraction of the sum of all transformed adaptive knowledge values of individuals in the deme (Eq 4). The transformation is adjusted by φ, allowing us to study the importance of fecundity selection. For example, we can turn off fecundity selection entirely by setting φ = 0: A world with no reproductive skew; all potential parents have the same probability of giving birth. The more we increase φ, the more we have a winner-takes-all world, where to win, one has to acquire adaptive knowledge. This is crucial in thinking about how, for example, our culture-gene co-evolutionary process is influenced by social organization and mating structures that create high reproductive skew. The φ parameter affected reproductive skew by increasing the breeding bias toward those with more adaptive knowledge. Though mating structure and reproductive skew are separable concepts, increased pair-bonding correlates with reduced reproductive skew. Thus mating structure is one mechanism, though not the only mechanism, that may affect reproductive skew. A perfectly monogamous pair-bonded society with no differential selection at the birthing stage would have φ = 0. Increasing φ allows for an increase in polygyny from “monogamish” (mostly pair-bonded) societies at low values of φ to highly polygynous winner-takes-all societies where males with the most adaptive knowledge have significantly more offspring (see Fig 2). Our model suggests that in these high reproductive skew societies, such as more polygynous societies, variation is reduced. This allows for the initial rapid evolution of larger brains, but with little or no variation, populations are unable to use social learning to increase their adaptive knowledge and are more likely to go extinct. At the other extreme, evolutionary forces are quashed when φ = 0. Social learning and the advent of culture-gene coevolution are more likely to occur when reproductive skew is supressed, such as in monogamish or cooperative/communal breeding societies or where sharing norms result in shared benefits despite skew in ability or success [see 62–64]. Of course, some argue that culture supports, or is responsible for, such mating structures in humans, which would require us to endogenize φ. In the present model, we treat φ as a parameter. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 2. The effect of φ on transforming adaptive knowledge. Here the mean adaptive knowledge of the deme is 1 (A j = 1). https://doi.org/10.1371/journal.pcbi.1006504.g002 Migration was fixed at 10% and thus, φ also affected the relative strength of individual, within-group selection and between-group selection. Between-group selection dominates when φ = 0 and is reduced as φ ≫ 0. (3) (4) We assume that more individual adaptive knowledge ( ) is associated with increased relative fertility. Using a binomial distribution, we instantiate the expected number of offspring n ij for each parent. A binomial distribution B(n, p) describes the number of successes in a sequence of n binary experiments (in our model, have offspring vs. don’t have offspring). The probability of success in any particular ‘coin flip’ is given by p. For each parent, we draw a value from a binomial distribution where the number of experiments is the Expected Value for the number of offspring in the deme ( ) and the probability is calculated by Equation 16, i.e. from . By drawing these values from a binomial distribution, the sum of Expected Values for the offspring of all parents is (i.e. ). Genetic transmission and mutation. The offspring (designated by a prime symbol) born to a parent are endowed with genetic characteristics similar to their parents. These offspring acquire four genetic traits from their parents—their brain size ( ), social learning probability ( ), oblique learning probability ( ), and oblique learning bias ( ). For each trait, newborn individuals have a 1 –μ probability of having the same value as their parents (b ij , s ij , v ij , l ij ). If a mutation takes place, new values are drawn from a normal distribution with a mean of their parent value and a standard deviation σ s for , σ v for , σ l for l, and σ v for and σ b b ij for . The standard deviations of and are not scaled by the mean, since these are probabilities and therefore bounded [0, 1] (the normal distribution is truncated at [0, 1]). Although is not bounded, we do not scale the standard deviation by the mean, because small changes in have a large effect on learning bias, due to the sigmoid function. Once offspring have been endowed with genetic characteristics, they then acquire adaptive knowledge. Their method and ability to acquire adaptive knowledge is affected by their genetic traits.

Stage 2: Learning. Asocially learned adaptive knowledge values ( ) are drawn from a normal distribution based on an individual’s brain size: . Rather than fix the variance and imply that the space of deviation in learning remains the same regardless of what has been learned, we allow the variance to scale with the mean of the distribution reflecting the idea of a thought space [16], where the space of possible deviations grows as the amount of knowledge grows. Socially learned adaptive knowledge values are drawn from a similar normal distribution, but with a mean of the model’s (t) adaptive knowledge value scaled by transmission fidelity (τ): N(τa tj , σ a τa tj ) and the variance similarly scaled by the mean. Fig 3 below illustrates the distributions from which these values are drawn and the effect of ζ and τ. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 3. Illustration of distributions for how asocial learning and social learning acquire adaptive knowledge. In (a) an asocial learner has a higher probability of drawing a value closer to their brain size if ζ is higher. In (b) a social learner has a higher probability of drawing a value closer to their model’s adaptive knowledge value if τ is high. Note that in both cases, adaptive knowledge cannot exceed brain size (a ij ≤ b ij ). Curves generated using Magnusson (2016) (rpsychologist.com). https://doi.org/10.1371/journal.pcbi.1006504.g003 For both asocial and social learning, an individual’s adaptive knowledge may not exceed their brain size. But, compared to social learning, asocial learning enables the immediate acquisition of adaptive knowledge based on one’s own brain size. Social learning is dependent on the adaptive knowledge possessed by parents, or those in the parents’ generation within the same deme, if selection extends the learning phrase through a juvenile period. In Stage 2A, newborn individuals can socially acquire adaptive knowledge from their parent i with probability . If newborns do not learn from their parents ( ), they learn asocially instead. In Stage 2B, individuals may update their adaptive knowledge through asocial learning with probability ( ) in the same manner as Stage 2A or obliquely from non-parents with probability . Individuals who do not asocially learn nor obliquely learn do no further learning. This allows us to study conditions under which oblique learning emerges during this extended learning period. Crucially, oblique learning has to out-compete a second round of asocial learning. We adjust the strength of the relationship between a potential model’s (m) adaptive knowledge and their likelihood of being modeled using the learner’s variable in the sigmoid tranformation function (5). A potential model’s (t j ) probability of being selected (p tj ) is given by (6). Notice that these have the same functional form as Eqs 3 and 4, and thus the transformation is similar to Fig 2. Both asocial and social learning only update adaptive knowledge values if these values are larger than those acquired during the first stage of learning, Stage 2A. (5) (6) Note, since we are interested in the evolution of social learning, we stacked the deck somewhat against social learning. Individuals have a chance of not doing any learning during Stage 2B. This creates an initial disadvantage for social learning, since any selection for social learning in Stage 2A risks missing out on a second round of asocial learning in Stage 2B.

Stage 3: Migration. Individuals migrate to a randomly chosen deme (not including their own) with probability m = 0.1, fixed to reduce the number of parameters. All demes have the same probability of immigration. Individuals retain their adaptive knowledge and genetic traits. There is no selection during migration; all individuals survive the journey.

Stage 4: Selection based on brain size and adaptive knowledge. We formalized the assumption that larger, more complex brains are also more costly using a quadratic function to link brain size to maximum death rate (c max ), capturing the idea that the costs of large brains escalate non-linearly with size. In early simulations, we also tested an exponential function, but our exploration revealed no important qualitative differences between the functions. To formalize the assumption that individuals with more adaptive knowledge are less likely to die ceteris paribus, we use the negative exponential function in Eq 7. The λ parameter in Eq 7 was varied between simulations and was used to determine the extent to which adaptive knowledge can offset the costs of brain size, where λ = 0 indicates no offset. As in our analytical model, the λ parameter can be interpreted as how much adaptive knowledge one requires to unleash fitness-enhancing advantages. (7) This function captures the idea that the increasing costs of big brains can be offset by more adaptive knowledge. We set c max = βb2; β = 1/10000 in our simulation). This results in a maximum empty brain size of b = 100. The choice of setting the maximum empty brain size to b = 100 was somewhat arbitrary, but allowed for a reasonable size brains to see a range of evolutionary behavior (it just sets the scaling). We illustrate the effect of λ in Fig 4 below. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 4. Reduction in death rate for different values of λ for a given brain size (b = 50 in this example). https://doi.org/10.1371/journal.pcbi.1006504.g004