Participants

A total of 300 Arizona State University students (150 women and 150 men) were randomly selected from a database managed by the Elinor Ostrom Multi-Method Lab at Arizona State University and recruited by email. Informed consent was obtained from all subjects before starting the experiment (ethical approval was given by Arizona State University IRB, codes: STUDY00002137 and STUDY00002273). The subjects ranged in age from 18 to 26 years (mean 19 years, s.d. 1.37 years). Participants received $5 for participating and an additional amount ranging from $5 to $30 depending on their own performance.

Procedure

The experiments took place in a computer room at the Elinor Ostrom Multi-Method Lab at Arizona State University. For each session, a maximum of 20 participants (exclusively male or female) were recruited and randomly assigned to one condition of the experiment. Participants sat at physically separated and networked computers and were randomly assigned to a group or worked alone. Players did not know who belonged to their group and were instructed that communication and note taking were not allowed. Before starting the experiment, participants were requested to enter their age and sex. Participants could read instructions on their screens. The game lasted 45 min, after which subjects received a reward according to their performance ($15 on average).

Game principle

The participants played a computer game (programmed in Object Pascal with Delphi 6) that simulated a real-world innovation process in which the production of complex artefacts depended on the discovery of high-level innovations. Discovering these innovations was contingent on the discovery of lower-level innovations. Both low- and high-level innovations resulted from a specific production process that was initially unknown to participants. Players were initially provided with six basic resources (Fig. 1a) that could be used without any limit and combined using a workshop panel containing four slots (Fig. 2). After dropping between one and four resources into this panel, players could trigger an automatic refining process at no cost and without any limit by clicking on a ‘Try’ button. Innovations arose when players produced a combination that belonged to a list of pre-determined successful combinations (Fig. 1b). A specific slot displayed the result: a red cross when the combination was unsuccessful, a new item otherwise. When discovered, new items could be in turn associated with other items to produce higher-level innovations. All combinations were allowed, including those involving the repeated use of the same item. The order of the items in the workshop panel had no effect on the result, so that 209 unique combinations could be produced from six initial resources. The production of new items led to a combinatorial explosion, so that 1,000 different combinations could be produced after the discovering of four new items/innovations. In total, 27 additional items (all useful) could be generated from the six initial resources. A stock panel allowed the players to store up to 12 items, in addition to the six initial resources (Fig. 2). The accumulation of innovations could result in the production of complex tools (such as axes) that potentially allowed players to get logs by cutting trees. Basic logs required at least eight innovations to be produced and were the minimal element that could be dropped into a three-slot totem pole panel, which provided players with a totem score. Logs could be refined when combined with relevant tools (such as carving tools, pigments, brushes and so on) in the workshop panel. 115 different logs could be produced, so that a total of 142 innovations and 266,915 unique totems could be generated.

Tutorial and pre-game information

Before starting, the players had to complete a tutorial during which basic actions, such as dragging and dropping resources into the workshop panel, had to be completed. The tutorial also guided player’s actions until the production of a first innovation (the same one for all the players) to make sure that all players mastered the game interface before starting the experiment. Players were informed that the ultimate aim of the game was to build a totem pole, that innovations had to be produced before being able to produce logs and that these logs could be used and refined to make totems. Players had no idea about which items could be produced during the game. Players were also informed that their score, and their monetary reward depended on the number of new items they were able to produce and the value of their totem. The fitness function that determined the value of a totem was unknown to players.

Score calculation

Each of the 115 different logs was associated with a unique value that was randomly attributed within a range of scores that depended on the log’s complexity. The complexity of logs was defined by the number of innovations that was required to produce them. It means that logs with higher number of underlying innovations were always more rewarding, although two logs with the same number of underlying innovations did not have the same value. The score of a totem, which depended on the value of the logs and their diversity, was calculated as follow:

With α taking the value 0, 1 or 2 depending on whether the totem pole involved 1, 2 or 3 different logs.

Totem scores ranged from 50 to 7,410 points. The players’ final score was equal to the score of the best totem they built plus 15 points for each new item they produced.

Treatments

All players were provided with an additional panel whose content varied according to the treatment (Fig. 2). Players from the individual learning treatment were only provided with their own score and a record of their own innovations (alongside their best totem, if any). All players could click on innovations from their own record to generate a reminder about how to produce them. Players from other treatments benefited from additional information and could switch between their own record and others’ record by clicking on an anonymised name (such as ‘player 3’) and associated score (Fig. 2). Other players’ scores were updated every 10 s. Players from the full and partial information treatments were permanently provided with five constant sources of social information. However, participants in the full information treatment were provided with the underlying combination when they clicked on other players’ innovations, while participants in the partial information treatment were not allowed to see the underlying combination (Fig. 2). Players from the small group and partial connectivity treatments benefited from the same innovation-related social information as in the full treatment, except that participants in the small group treatment benefited only from two constant sources of information (from the other members of their group), while the latter benefited from 1 changing source of information (among 5). In the partial connectivity treatment, the between-players ties were always reciprocal, so that, at any time, the population structure can be described as a 3 × 2-player group metapopulation. The between-players ties were randomly generated and varied every 3 min, so that, on average, the probability of connecting every possible pairs of individuals during the course of the experiment was about 1. Players provided with social information could observe other group members without any limit. All treatments involved 30 men and 30 women in single-sex groups (to facilitate statistical analyses).

Bots

Isolated bots generated combinations through a two-step process: they first choose a random number N of workshop slots (1≤N≤4) before randomly selecting N items from the same initial pool of items as human participants. When they generated a successful combination, the resulting innovation was added to their pool of items, which was used during subsequent trials. The number of attempts performed by bots was parametrized using data generated by humans. Human participants produced an average of 380 attempts, but half of them were redundant (51%), which is mainly explained by a limited memory capacity (an average 45% of unsuccessful combinations were redundant combinations). As we were interested in the effect of the ability to generate guided variation (and not in the effect of memory), we allowed bots to generate 188 unique attempts/combinations (that is, new combinations were generated at no cost when bots randomly produced an already tried combination), which was the average number of unique combinations that human participants produced during our experiment (the results are not sensitive to variation in the number of attempts around this mean). The final score of isolated bots was based on the number of innovations they discovered as no isolated bots were able to produce logs (the minimal element required to build a totem). Groups of six bots generated combinations according to the same process, except that they benefited from the innovations of other bots in addition to their own discovery, which simulated the effect of social learning. Bots could instantly use other bots’ discoveries to generate new combinations and progress further in the fitness landscape (as each innovation was useful). Bots that produced logs were provided with a totem score that equalled the maximum number of points that could be obtained from these logs. The social bots final score was based on the number of innovations they produced and their totem score.

Analyses

All statistical analyses of scores were based on linear mixed models with the log-transformed individuals’ final score as the response variable and group identity as a random effect. Preliminary analyses revealed no effect of sex or age and were not introduced in the final statistical models.

Reasoning abilities and social learning. The data set was composed of the performance of humans and bots that were either isolated or organized in groups of six. The binary variables ‘reasoning abilities’ (that is humans or bots) and ‘social learning’ and the interaction between both were evaluated as explanatory variables.

Social learning mechanisms. The data associated with the ‘isolated individuals’, ‘partial information’ and ‘full information’ treatments were considered. ‘Treatment’ was introduced as explanatory variable.

Population structure. Two analyses were run. In the first, the data set was composed of the data from the ‘isolated individuals’, ‘small group’ (three-player group) and ‘full’ (six-player groups) treatments, and ‘group size’ was introduced in the model as a continuous variable; in the other, the data set was composed of the data from ‘partial connectivity’ and ‘full connectivity’ treatments, and ‘connectivity’ was modelled as a binary variable (low or high).

Rate of innovation. To determine whether humans were able to generalize the function of specific items within the game, we investigated the number of unsuccessful combinations that isolated players had to generate before producing a successful one. This log-transformed value was the response variable. The rank of the innovation (within the player’s own innovation record) was evaluated as an explanatory variable. Theoretically, the response variable should be strongly affected by the number of possible combinations that one player was able to generate from his/her pool of items. For this reason, we introduced the corresponding log-transformed number of possible combinations that the player could produce as a control variable. The player’s identity was introduced as random variable. Our results indicate that the number of unsuccessful combinations that players had to generate before producing a successful one was negatively affected by the rank of the innovation (LRT: χ2=5.32, d.f.=1, P=0.02, N=431), indicating that players got better at generating successful combinations across time. As expected, the number of possible combinations negatively affected the individuals’ ability to find successful combinations, although this effect was only marginally significant (LRT: χ2=3.02, d.f.=1, P=0.08, N=431).

All statistical analyses were conducted using R version 3.0.1 (ref. 36). The significance of explanatory variables was assessed by comparing full and restricted models using LRTs and parametric bootstrapping with 1,000 simulations. Both tests yielded qualitatively similar results. Mixed models, LRTs and parametric bootstrapping were performed using the lme4 (ref. 37) and pbkrtest38 packages.