Our model is an event-based, individual-based model with a spatially-explicit environment and is freely available at https://bitbucket.org/dvanderpost/aapjes_bmc_eb_2016. The key design feature of the model is that we define behavioral decision making and the outcome of behavioral events, including learning, at a local spatio-temporal scale. We then study the meso- and macro-scale consequences of that local behavior to establish the mapping between different mechanisms at a local scale and information processing and payoffs at a larger scale. While the model is formulated ‘keeping primates in mind’, and a large number of parameter values are based on estimates of natural primate systems, we expect our conclusions to generalize to other animal taxa, particularly those with similar movement patterns and repertoire sizes. The model is based on previous models of learning in group foragers [27, 28, 31], but now includes skill learning, observational learning, dynamic populations and group sizes, and evolving parameters. The following model description is limited to those aspects needed to gain a reasonable understanding of the results, with key parameters listed in Table 1. For further details see Section 1 in Additional file 1.

Table 1 List of key parameters Full size table

Model overview

We first give a short overview of the model, followed by further details.

Entities: The model is composed of groups of foragers and patches made up of resource items, which are situated in continuous space (Fig. 1 a and Additional file 2).

Fig. 1 Model details. a Simulation snapshot. Each forager is indicated by a SEARCH area (gray semi-circle), REACH (gray circle) and a movement trajectory (red to blue line). When a foragers observes another forager the foragers are connected by an olive-green line. For illustration purposes, the resource items are shown as colored circles, and patches by a larger gray circles. Each patch can be assumed to be a distinct patch type, with unique resource types (different colours within a patch). b Illustration of decision-making algorithm. Rectangles are actions and ellipses are decision-making points. After completing one of the actions at the right hand side, all foragers start the decision-making process at the top left (SAFE?). RAND is a random number between 0 and 1, and ω i is the probability to do OBSERVE. MOVETOFOOD is always followed by EAT. MOVE consists of at many 1 meter steps to complete a distance of δ i . c Illustration of how rewards e ir change with time spent practicing that skill for different resource types (Eq. 5): resources for which not much practice is needed (solid lines, low H) and those for which a lot of practice is need (dashed line, high H); and resources for which rewards increase fast immediately (black lines, low S) and those for which they increase slowly initially (gray lines, high S). d Illustration of how selectivity (Eq. 1) affects which subset of resources are chosen: overall resource quality distribution given by N(0.1,0.1) (light gray) and subsets chosen when selectivity is low (dark gray, a ie =0.1) and high (black, a ie =0.3), given σ i =5 and assuming the forager knows all resources perfectly Full size image

State variables: Resources items are defined by a position, and a type which is characterized by quality Q r , and two parameters defining how difficult the resource type is to process (H r and S r ), or ‘task difficulty’. H r defines the practice time (or experience) needed to develop half of the maximal skill for that resource type, and S r defines the shape of the function of how skill increases with experience (see ‘Skill learning’ below). Patches are emergent from clumps of resource items in space, and have a type defined by a set of 5 resource types that only occur in patches of that type. Foragers are defined by a position and heading, a current action and a time to its completion, short-term memory about movement and foraging goals, and long-term memory about the rewards associated with resources and resource processing skill. Foragers can differ in their information about resources and skill levels, as well as in their propensity for learning as defined by parameters that can mutate (see Table 1).

Processes and scheduling: The implemented processes in our model can be organized hierarchically as: (i) local decision making and movement of foragers; (ii) learning; (iii) life-history updating and demographics; and (iv) environmental updating.

Local decision making is governed by a decision-making algorithm which encodes sensing, decision making, movement, grouping and the updating of short-term memory. In simulations with grouping, foragers belong to a particular group, and follow behavior rules that ensure that groups move cohesively through the environment. All foragers are placed in a queue according to the time their action ends. The forager with the least time remaining is next to choose an action and is put back in the queue according to the time its new action ends. In this event-based setup, actions of foragers can overlap in time, and some foragers can complete multiple quick actions (e.g. move) while others are engaged in actions that take more time (e.g. searching for food).

The learning algorithms include representations of individual and social learning, and update long term memory about properties of resources that foragers interact with as a consequence of their decisions.

Life-history updating occurs at regular time intervals and includes: (i) metabolism or energy expenditure; (ii) digestion of consumed resources; (iii) deaths and (iv) births of foragers; and (v) splitting of groups. After a forager dies, a forager is selected from the remaining population to reproduce, thus maintaining a fixed population size. Foragers are selected to reproduce in relation to their energy levels, where a doubling in energy leads to an 8-fold increase in the probability to reproduce. Offspring inherit the parameter values of their parents with a chance of mutation (see Table 1). In simulations with grouping, groups grow due to births until they reach a maximum size, and then split randomly into two equally sized daughter groups. Groups shrink due to deaths and disappear when the last group member dies.

Environmental updating occurs at regular intervals and involves the ‘growth’ of all resource items at the beginning of each year and ‘environmental change’ that changes an existing resource types into an unknown (for foragers) new resource type. ‘Resource consumption’ occurs when foragers consume resources as determined by ‘local decision making’.

Spatio-temporal scaling: The environment is a continuous space of about 40 km2, foragers take steps of a meter at a speed of 0.5 m/s, and patches are 20 meters in diameter (Fig. 1 a and Additional file 2). Foragers can observe resources up to 2 meters away, and can observe which resources their neighbors are interacting with at 20 meters (a best case scenario for social learning, Additional file 3). There are no constraints on observing group members for grouping purposes in order to ensure cohesive groups, but the spread of groups tends to be in the order of 5–40 meters. All movement occurs in continuous space and there are no constraints on direction.

The timescale is defined in terms of the foragers’ behavioral actions that vary in duration from about a few seconds to a minute. In the model a year is defined as 360 days, and a day is 12 h or 720 min, where we focus on daylight time in a day. Thus foragers can complete many hundreds of behavioral actions in a day and learn from them. Energy expenditure (metabolism) occurs every minute. Digestion occurs every 100 min (DIGESTIONTIME). Foragers can live maximally for 20 years, but can die before that at any minute.

Resources

In our default setting, resource items of 250 resource types are distributed in 24500 patches with 1200 items each. There are 50 patch types, and a patch type is characterized by the presence of five resources types that only occur in that patch type (as in trees with fruit, leaves, flowers etc). In order to generate variation across patches of a given type, each patch of a given type is defined by three resource types which are randomly selected from the five resource types that characterize that patch type.

While these parameter values typically underestimate the diversity of natural environments, we strike a pragmatic balance between model complexity and simulation environments that are too simple, and where learning hardly plays a role [28]. We compare this ecological context with randomly distributed resources without patches, and pure patches where each patch type has only one resource type.

Resource items disappear when consumed by foragers, and are then unavailable for consumption. Resource ‘growth’ happens once a year, when all resource items that have been consumed by foragers reappear in the exact same position (for computational reasons) and with the same type. Environmental change occurs randomly at any minute with a given probability and changes a randomly selected resource type into another newly generated resource type which is unfamiliar to the foragers. For ease of interpretation we express this as a rate, namely how many resource types change per year (EC). All resource items of the disappearing type change into the new resource type. We vary EC across simulations to determine the effect of environmental change. We compare this kind of environmental change to one where resources do not disappear and change into new ones, but where resources remain familiar but change in quality.

The quality of a resource type Q r is drawn from a random distribution with mean 0.1 and standard deviation of 0.1 (Fig. 1 b light gray), and all items of a given resource type have the same quality. Thus we generate variation in quality across resource types which enables the learning process to be studied as an optimization process. Quality defines the maximal reward that a forager can obtain from a resource type when it has sufficient experience with processing that resource type. Task difficulty is defined by H r , the practice time (or experience) needed to obtain half of the maximal reward of that resource type, and S r , which defines how the reward increases with experience (see ‘Skill learning’ below). S r varies randomly between 1 and 4 (integer values only) and H r is varied across simulations to determine an overall difficulty of learning in the environment.

Local decision making

Foragers can choose between several local actions, namely, MOVE, SEARCH, MOVETOFOOD, EAT, MOVETOGROUP, OBSERVE and NOTHING, which are selected according to a decision-making algorithm (Fig. 1 b). In the algorithm, individuals start by checking if they are safe (CHECKSAFE), which implies having a sufficient number of neighbors (9) in SAFESPACE (17 meters). During CHECKSAFE, foragers can also observe neighbors within COPYSPACE (20 meters), and can monitor the resources with which those neighbors interact (Fig. 1 a). These observations are relevant for stimulus enhancement (SE) and observational learning (OL).

If not safe, foragers do MOVETOGROUP, which means that a forager moves towards the center of its group, calculated as the mean position of the other members of its group (Fig. 1 b, first line). Once safe, the forager then aligns its own heading with the average direction of other members of its group in ALIGNSPACE (20 meters). This attraction-alignment algorithm ensures that foragers stay together but travel in a relatively efficient manner through the environment.

If safe, foragers do OBSERVE (τ i minutes) with probability ω i , which leads to observational learning (OL, see below; (Fig. 1 b, second line). Otherwise, with probability 1−ω i , foragers will select one of the remaining actions. If foragers are not HUNGRY (stomach content is at a maximum capacity of 20 resource items), foragers will do NOTHING (1 minute; Fig. 1 b, third line). Stomach contents are reset to zero at DIGESTIONTIME.

If HUNGRY, and if they have already selected a resource item for consumption (FOODTARGET), foragers will EAT (1 min.), or MOVETOFOOD if the item is beyond reach (0.9 meters) and EAT once the item is within reach (Fig. 1 b, fourth line). If foragers do not yet have a FOODTARGET but their last action was SEARCH, this means they did not find any resource items in view sufficiently attractive and then they will MOVE forward δ i meters in the direct the foragers is facing (Fig. 1 b, fifth line). If they did not yet SEARCH, they will SEARCH (Fig. 1 b, sixth line). During SEARCH up to 20 resource items in view (2 meters) are assessed in sequence (Fig. 1 a, grey semi-circles). The 20 items are randomly selected from those in view. The search terminates as soon as an item is chosen for consumption, or when none of the items is chosen.

Food choice algorithm

During SEARCH, a forger’s decision to EAT a given resource item is determined by its (i) exploration tendency P E (see below), (ii) personal information about the rewards associated with that resource type (a ir ), and (iii) whether the forager has been socially stimulated by seeing another forager eat that resource type P S (see below). During evaluation of a resource item, these three factors come together to determine the probability P F to choose to eat that item as follows:

$$\begin{array}{*{20}l} P_{F} &= P(r | a_{ir}, a_{ie}, \sigma_{i}, P_{E}, P_{S}) \\&= min\left[ 1.0, \left(\frac{a_{ir}}{a_{ie}} \right)^{\sigma_{i}} + P_{E} + P_{S} \right] \end{array} $$ (1)

where a ir is the reward forager i expects from resource type r (personal information based on reinforcement learning), a ie is an assessment of the quality of resources that can be found in the environment (see below), and σ i scales selectivity, i.e. how likely an individual selects when a ir <a ie . Since associations are initially zero (a ir =0), unknown resource types can only be sampled via P E or P S . For solitary foragers this means that P E must be greater than zero. For grouping foragers, P S could in principle replace P E as the means to sample unknown resources. Once a ir >0, \(\left (\frac {a_{ir}}{a_{ie}} \right)^{\sigma _{i}}\) contributes to the probability of choosing a certain resource type, which is maximal when a ir >a ie and less than one if a ir <a ie . If a ir >a ie , the forager is certain to choose the resource item, irrespective of P E and P S . The impact of P E and P S is therefore greatest when resource are relatively unfamiliar (a ir <a ie ).

Selectivity is adjusted relative to environmental conditions by adjusting a ie (Fig. 1 d, compare dark gray and black). When a forager’s stomach is not full at DIGESTIONTIME, the forager decreases its environmental expectation: a ie′=(1.0−ϕ i )a ie ; otherwise the expectation is increased: a ie′=(1.0+ϕ i )a ie , where ϕ i determines the rate with which a ie is changed. Each time the forager is too selective, it does not fill its stomach and reduces its selectivity, and vice versa. As a result, a ie is tuned in order to optimise energy intake, within the constraints of the algorithm. Qualitatively, this selection algorithm can give rise to the optimal food choice rule [44] where only resources above a certain perceived quality are eaten and all others are ignored (zero-one rule). Note however that our algorithm works on perceived quality and not actual quality since the foragers are learning about resource quality and are not omnipotent. Moreover, σ i can evolve, so that while the zero-one rule is possible, it need not evolve.

Satiation aversion: foragers develop temporary aversions after becoming satiated (stomach filled) with a given resource type. Satiation aversion causes foragers to completely ignore that resource type for one DIGESTION cycle (100 minutes) after which the aversion disappears. Satiation is common in foragers like primates that consume many secondary ‘toxic’ compounds [45], and/or require a balanced diet [46]. This model specification ensures that foragers consume a diverse set of resource types [31].

Learning

In the absence of any social influences on learning, learning in our model is composed of (i) exploration, (ii) reinforcement learning about rewards associated with resources, and (iii) skill learning. All foragers start life without any knowledge about resources, and so do not have any expectation about energy rewards (a ir =0) nor any resource processing skill. To enable foragers to sample (partially) unfamiliar resource types, and hence to start learning, we implemented exploration. After processing resource items, foragers develop skill, which increases the rewards they can obtain from resources items of that type. After consuming resource items, foragers develop expectations about rewards via reinforcement, and can use those to decide what to eat.

Exploration: The probability that a forager explores an item of resource type r is:

$$ P_{E} = P(r | \varepsilon_{i}, c_{ir}) = \varepsilon_{i} (1 - c_{ir}) $$ (2)

ε i is the exploration rate, and c ir is the certainty with which forager i assesses the reward of resource type r. Certainty was included to ensure that foragers do not continue exploring when already highly familiar with resources. For completely unfamiliar resources c ir =0 and there is no certainty. However, when rewards from resource types no longer change, for instance because skill levels are high, certainty becomes high, and foragers end up with a low tendency to explore that resource type. Certainty c ir is updated as follows:

$$ c_{ir}'=(1-\lambda_{i})c_{ir}+\lambda_{i}\left(1-min\left(1.0,\left|\frac{e_{ir}-a_{ir}}{e_{ir}}\right|\right)\right) $$ (3)

where e ir is the reward forager i obtains from resource r, and the same learning rate (λ i ) and discrepancy (e ir −a ir ) are used as during updating of expected rewards (see Eq. 6).

Skill learning: A forager i’s skill s ir for processing a specific resource r is a function of experience t ir and ‘task difficulty’:

$$ s_{ir} = \frac{t_{ir}^{S_{r}}}{H_{r}^{S_{r}} + t_{ir}^{S_{r}}} $$ (4)

which is 0 when t ir =0 and tends to 1 when t ir becomes very large. t ir is the total time a forager i has spent processing a resource type r in its life, and increases each time the forager processes and consumes a resource item of type r.

Skill s ir determines the reward e ir forager i obtains from resource type r as a function of resource quality Q r :

$$ e_{ir} = Q_{r} s_{ir} + N(0, Z) $$ (5)

where N(0,Z) represents environmental noise, where a value is drawn from a normal distribution with mean 0 and a standard deviation of Z (0.005). Resource types with high H (Fig. 1 c, dashed lines) take longer to learn, while resource types with high S have a shallow increment in rewards during initial learning (Fig. 1 c, gray lines).

Reinforcement learning about expected rewards: The rewards that foragers associate with each resource type r are updated via reinforcement as follows:

$$ a_{ir}' = a_{ir} + \lambda_{i}(e_{ir} - a_{ir}) $$ (6)

where association a ir is the reward that forager i associates with resource type r, e ir is the energy obtained from resource type r, and λ i is the learning rate. This corresponds to a Rescorla-Wagner model [42] where all stimuli have the same salience. Associations are initially non-existent (i.e. zero), and the reward is obtained immediately after consumption of the resource leading to direct reinforcement.

Social influences on learning

Local enhancement (LE): arises spontaneously through grouping behaviour, since individuals are inclined to approach locations in which other members of their group are found, and thereafter to interact with resources in those regions. We therefore do not directly implement local enhancement, but it emerges spontaneously as soon as foragers move in groups [28]. The local enhancement that we consider is coarse grained, and does not direct individuals to particular resources, or to features of those resources.

For the two other social learning mechanisms, during CHECKSAFE a random ‘demonstrator’ is selected from any neighbors in COPYSPACE (see ‘Local decision making’) that are processing and consuming a resource. The impact of the demonstrator depends on the social learning mechanism.

Stimulus enhancement (SE): In addition to selecting resources according to their expected reward and the tendency to explore a given resource type asocially, SE increases a forager’s probability to consume resource type r by:

$$ P_{S} = P(r | \gamma_{i}, d) = d \gamma_{i} $$ (7)

where γ i indicates the strength of SE, and d=1 if forager i observed a neighbor consuming resource r within the last 30 min. and otherwise d=0. Only one resource type r is subject to SE at a time. SE does not directly affect expected rewards or skill.

Observational learning (OL): occurs during the action OBSERVE at rate ω i (see ‘Local decision making’) and allows forager i to increase its processing skill for a specific resource type, in proportion to the time spent observing, where the change in experience Δ t ir is:

$$ \Delta t_{ir} = max[K \frac{o_{ik}}{M}(t_{kr} - t_{ir}),0.0] $$ (8)

where K scales the increase, determining how effective skill copying is, and o ik is the effective time forager i observes neighbor k: o ik =m i n[τ i ,p k ], where τ i is the maximum time forager i decides to spend observing its neighbor, and p k is the time left for neighbor k to complete its present action. Greater observation time leads to greater skill acquisition, where maximal observation time is the maximal time it takes to process and consume a resource (M). The increase in the skill level is bound to the skill level of the observed individual, and there is no skill gain if the skill level of the observed individual is lower than, or equal to, the forager’s own skill level. A forager does not know in advance whether a ‘demonstrator’ is highly skilled or not. Observation does not provide information about rewards.

Energy budget, population turn-over and selection

The energy budget is determined by (i) energy gain due rewards from food intake which depends on learning at every DIGESTIONTIME, (ii) a per minute energy metabolism cost (METABOLISM, see Section 1 in Additional file 1), and (iii) an energy costs of 5000 for a reproduction event, which represents a substantial part of total energy. Energy accumulates if energy intake from food exceeds metabolism and reproduction costs.

Foragers die of old age (at 20 years), stochastically determined deaths, or starvation. Births occur as a function of energy reserves each time a forager dies, keeping the population constant at size N (100), where probability that forager i reproduces is:

$$ P_{R} = P(i | N) = \frac{{h_{i}^{W}}}{\sum_{j=1}^{N} {h_{j}^{W}}} $$ (9)

where h i is an individuals energy level, N is the population size, and W (=3) scales the strength of the selection function.

The learning and foraging parameters δ i , ϕ i , σ i , ε i , λ i , γ i , ω i , τ i , are specific to forager i. Parameter combinations that lead to greater energy levels lead to faster rates of reproduction. An offspring inherits its parent’s parameters, with a chance of mutation (0.05). In case of mutation, a new parameter value is drawn from a normal distribution centered on the parent’s parameter value, and with a standard deviation that is one fifth of the maximum value of the parameter (see Table 1). Thus parameters can vary between individuals and can evolve over time via inheritance to offspring, mutation and natural selection. The mutation rate was selected operationally such that parameters evolve consistently within a reasonable time frame.

Foragers are born in their parent’s group. There is no migration between groups. The population is inviable if the average energy level does not rise above the minimum energy needed to give birth.

Emergent dynamics

Since we only define local sensing and behavioral actions of foragers, the development of a forager’s repertoire emerges from its interaction with the environment over time. This environment includes the resources and their distribution, which affects the temporal autocorrelations in encounters with resources. The movement of foragers is characterized by inter-patch travel where no resource items are found, and intra-patch search, assessment and consumption of resource items. Within each patch, a forager has access to the resource types that are present in that patch. Over their lifetime, foragers encounter all patch types and all the resource types they contain, many times, thus there is ample opportunity to consume all resource types repeatedly. On reaching a patch, a forager’s experience with those resource types will depend on previous encounters with those resource types, and if it consumed those resources in the previous digestion cycle it could be satiated with respect to those resource types.

The dynamics of foraging are characterized by learning and food choice [28, 31]. Foragers move through the environment and when they encounter resource items, the food choice algorithm determines whether any are consumed (Eq. 1). Foragers start out exploring various unknown resources (via P E and/or P S ), and as they gain experience about rewards, personal information tends to become more dominant in their food choices. Personal experience is updated after consumption events and includes a ir , the assessment of rewards (Eq. 6) and the increment of skill (Eq. 4) which in turn increases the reward obtained (Eq. 5). Due to consumption of many resources, the expectation of the environment a ie will increase, increasing the fraction of resources for which a ie is greater than a ir . This increases selectivity towards resources with high a ir , and can lead to reduced food intake (i.e. a forager’s stomach is no longer full at digestion). At this point a ie decreases again. Thus the forager’s expectation of the environment a ie tends to equilibrate on a value in relation to values of a ir , such that the intake of resource items is close to the maximum of 20. This ensures that the forager is eating selectively but still eating close to the maximal number of resource items within each digestion cycle (DIGESTONTIME). The ratio of a ir to a ie is therefore similar across simulation types, irrespective of how fast a ir increases due to differences in skill development time.

The combination of (i) food choice biased to resource types with high a ir (selective foraging), and (ii) learning via updating of a ir and t ir , generates a positive feedback.

This positive feedback generates a familiarity bias and a development process that is contingent on stochastic initial conditions, leading to idiosyncratic learning histories and somewhat arbitrary variation between foragers in their knowledge of the environment. Therefore, while learning is biased towards high quality resources, due to an intrinsic familiarity bias in the process, learning can get ‘stuck’ on a self-stabilizing repertoire as soon as this repertoire fulfills the intake needs of the forager [28]. This familiarity bias becomes strong in environments with pure patches, and when foragers do not become satiated after eating a lot of a given food type [28, 31]. We therefore focus on patches with several resources and satiation as a default case, which stimulates foragers to develop diverse diets.

The familiarity bias implies that foragers have greater t ir for some resources than others, and also a more accurate assessment a ir of rewards e ir . Since λ i typically evolves to high values (see Section 4 in Additional file 1), a ir is generally an accurate estimate of e ir . The main cause for differences in familiarity is therefore differences in t ir and these determine differences in e ir and a ir . As a result, the impact of social influences on learning therefore concern (i) biases on choosing resource types, which indirectly affect t ir in the case of LE and SE, and (ii) direct gains in t ir in the case of OL.

In groups, the actions of neighbors and group-level dynamics can have indirect and direct influences on food choices and learning [28]. Due to the need to stay in a group (imposed in the model), there is a strong ‘consensus’ or ‘conformity’ effect, where the decision of neighbors to stop or not stop in a patch can affect the feeding opportunities of foragers and hence their learning trajectories. Moreover, the direct observation of neighbors and its effects, depends on what neighbors have decided to eat, or depends on copying opportunities [27]. In turn, the effect of a social stimulus will depend on what an observer already knows, and whether it can find the resource type of interest. If a forager would already choose a resource item on its own accord (a ir >a ie ) then P S would not matter and the social influence would be redundant.

Moreover, P S can increase the rate of food intake and feedback on selectivity via the updating of a ie .

Thus the impact of evolving parameters, in particular those of exploration and social learning, are not predefined in the model and are the object of study. In previous work, which can be considered a baseline for, and pseudo-replicate of this study in terms of the foraging and grouping parameters, the evolutionary attractors have been established [24, 39, 41]. Our results here are consistent with those findings (see Section 4 in Additional file 1). Here we go beyond these existing models and study how foraging and (social) learning parameters co-evolve.

Simulations and analysis

To analyze our model we distinguish between different classes of parameters (see Section 2 in Additional file 1). Of the 50 parameters, 22 are independent fixed parameters that are either empirical estimates relevant for primates, or are computationally motivated but still empirically reasonable. 17 additional parameters either follow logically from, or are constrained in some way by, independent fixed parameters, and are also empirically reasonable. This group of 39 fixed parameters sets the empirically motivated spatio-temporal scaling context, including life-history, that is relevant for primates and other small-medium mammals, in which the learning mechanisms that we study are embedded.

Within this context, we focus on the key parameters of interest, namely the 4 evolving parameters that define exploration (ε i ), stimulus enhancement (γ i ) and observational learning (ω i and τ i ), and the 8 fixed parameters that define grouping (social context for local enhancement and other social influences on learning). To do so we ran evolutionary simulations with solitary populations (S, where grouping is switched off) in order to establish an asocial baseline, and then ran three kinds of simulations with grouping: (i) G LE , grouping where only local enhancement occurred; (ii) G SE , grouping where stimulus enhancement (γ i ), but not observational learning, could evolve freely; (iii) G OL , grouping where observational learning (ω i and τ i ), but not stimulus enhancement, could evolve freely. In all cases, the exploration rate (ε i ) could evolve freely. The remaining 4 evolving parameters (δ i , σ i , ϕ i , λ i ) ensure that the foraging and reinforcement learning parameters are not arbitrarily defined, but co-evolve with the main parameters of interest.

To study the effect of the environmental context we vary (i) the task difficulty of resources (H r ) and (ii) the rate of environmental change (EC). As a default we considered patchy environments with multiple resource types in each patch (mixed patches). For additional sensitivity analysis we tested the main qualitative results in environments with (i) patches with a single resource type (pure patches), (ii) randomly distributed resource types (random), and (iii) environmental change where resource types change in quality Q r , but remain known to foragers (comparable to [20, 29]). We do not vary parameters that define life-history characteristics and spatio-temporal scaling as this is beyond the scope of our species of interest.

Our analysis included two main steps. First we ran evolutionary simulations for 1000 years. In simulations with solitary populations evolvable parameters were initialized on randomly selected values. Simulations with grouping parameters were initialized with evolved parameters from solitary simulations, but could continue to evolve. We repeated this process in each kind of environment, and in each case repeated 10 simulations with different random seeds.

We measured the impact of a particular learning mechanism in terms of average energy levels in the population. To determine the evolved values of parameters we analyze parameter values from ancestors (obtained from ancestor traces) at the end of simulations (year 850–950). We used the averages of 10 simulations to represent the ‘evolved parameters’ for a given condition, and compared their consistency across the different environmental settings (see Section 4 and 6 in Additional file 1). In this way we established ‘evolutionary attractors’ for the set of evolving parameters. In our results we focus on exploration and social learning parameters (ε i , γ i , ω i and τ i ). The results of other evolving parameters do not change the interpretation of the results (see Section 4 in Additional file 1).

Second, to determine why parameters evolved to particular values, and to establish the impact of particular learning mechanisms, we conducted additional analysis using two kinds of non-evolutionary simulations without mutations. ‘Parameter sweep’ simulations were used to study the effect of systematically varying the value of a single parameter (local sensitivity analysis), both across and within groups while keeping other parameters fixed on average evolved values that where relevant for a particular social learning mechanisms and ecological condition. ‘Switch’ simulations were used to study the effect of introducing a particular learning mechanism into a population of foragers initialized with another mechanism. This was done by initializing the population with the average evolved values of parameters for the initial mechanism, with values of parameters relevant for the second mechanism set to zero. The parameter values were then changed to those of the average evolved values of the second mechanisms at the time where the switch was desired.

In these shorter simulations (80–140 years) we measure diet repertoire statistics in more detail: (i) total energy intake \(= \sum \limits _{r=1}^{r=R} d_{ir} e_{ir}\), where d ir is the total number of items of resource type r that were consumed by forager i, and e ir is the per item reward obtained; (ii) repertoire quality \(= \sum \limits _{r=1}^{r=R} p_{ir} Q_{r}\), and (iii) average skill \(= \sum \limits _{r=1}^{r=R} p_{ir} s_{ir}\), where p ir is the proportion of resource r in individual i’s diet. Section 3 in Additional file 1 provides further detail about different simulations types.

In sum, while the analysis contains a large number of parameters, the vast majority of these provide a realistic simulation context, and the parameter space for the remaining few is fully explored within realistic bounds.