Experiment 1

There was a strong positive correlation between individual estimates within groups after, but not before, group discussion (S1 Fig). This convergence of estimates within groups demonstrates social influence after information is shared during the group discussion stage [4]. A collective intelligence effect was found with an overall reduction in errors after group discussion (Fig 1a, negative binomial Generalised Linear Mixed Model (neg. bin. GLMM): LRT 2,433 = 50.66, p < 0.001), an effect which was independent of the participants’ gender or age (S1 Table). The majority of participants improved their estimate when giving their group consensus estimate compared to their initial estimate (96 participants reduced errors, while errors did not change or increased for 51 participants; Fig 1b). However, when comparing errors in the initial and post-discussion individual estimates, when estimates within groups could again vary, the probability that participants improved their estimate was close to 0.5 (76 improved, 71 did not; Fig 1c). This difference between the group consensus and post-discussion stages was statistically significant (binomial GLMM: LRT 1,288 = 13.49, p < 0.001), suggesting that the group consensus stage was the most beneficial in reducing errors. Whether estimates improved after group discussion was independent of the participants’ ages (S1 Table), although female participants were more likely to improve their estimate after group discussion than males (LRT 1,288 = 8.02, p = 0.0046).

PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 1. Collective intelligence in adolescents. In experiment 1 (a,b,c), the errors in the pre-discussion initial estimates are significantly greater than both the group consensus (neg. bin. GLMM: z = 6.74, p < 0.001) and post-discussion estimates (z = 5.27, p < 0.001). Group estimates tended to have less error on average than individual estimates given after discussion although this effect was not statistically significant (z = -1.77, p = 0.077). The frequency distributions of the error in the initial estimates minus the error in the group consensus estimates (b) or post-discussion individual estimates (c) per participant show that while most participants only gain a small improvement in reduced errors (b) or show little change in errors (c), a minority of individuals vastly reduce the error in their estimates after group discussion. In Experiment 2, the absolute error in the consensus group estimate was lower than the initial (pre) estimates in all three treatments (proportion of black sweets: d: 48/200, e: 94/190, f: 121/160). The box plots show the median (thick black lines), interquartile range (enclosed by the boxes), 1.5 × the interquartile range beyond the boxes (whiskers) and outliers beyond the whiskers (open circles). https://doi.org/10.1371/journal.pone.0204462.g001

To explore how accuracy was affected by the disagreement of initial estimates within each group, the effect of the range of initial individual estimates on the error in the consensus group estimate was analysed (Fig 2). When the error of the group estimate was expected to be high due to the mean of the initial estimates being inaccurate, more disagreement in initial estimates resulted in better group estimates than expected (neg. bin. GLM: LRT 1,43 = 4.44, p = 0.035). In other words, higher disagreement lead to greater accuracy, weakening the relationship in error between initial and group estimates. This reflects a similar trend found in adults [22], and we also find that groups with a greater range in initial estimates shifted more from the mean of their estimates when giving their group estimate (Fig 3a, neg. bin. GLM: LRT 1,45 = 18.31, p < 0.001).

PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 2. The effect of disagreement in initial individual estimates on improving group estimates in Experiment 1. Disagreement is measured as the range of initial estimates in each group. The colours represent this range, binned every twenty units. Coloured lines are fits for each range interval, calculated from the GLM coefficients which includes the significant interaction term between the error of the (arithmetic) mean and the range of the initial estimates. The main effects of gender and mean age are included in the calculation of fitted values, each fixed at their mean values in the data. If the error in the mean initial estimates directly determines the error in the group estimate, there should be a positive linear relationship between the two variables (as occurs with the darker points, i.e. groups with a smaller range of initial estimates). No relationship between the two errors (mean initial and group) can occur if the group discussion revises the estimate enough that the error of the mean initial estimates is no longer predictive of the error of the group estimate, as occurs with the more lightly coloured points. https://doi.org/10.1371/journal.pone.0204462.g002

PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 3. How initial disagreement shifts the consensus group estimates away from the mean of initial estimates in Experiment 1. Disagreement is the range of initial estimates. (a) shows the absolute percentage difference relative to the arithmetic mean, and (b) relative to the geometric mean. https://doi.org/10.1371/journal.pone.0204462.g003

The wide range of initial estimates in some groups was usually due to a single estimate, i.e. an outlier, which differed from the estimates given by the other two group members. This was evident in the lack of an effect the range had on the mean of the two initial estimates that were closest to one another, particularly when compared to the positive relationship between the range and the mean of all initial estimates (S3 Fig). The groups’ consensus estimates were, however, intermediate between the two, having a significantly steeper relationship with the range than the mean of the two estimates that were closest to one another (neg. bin. GLMM: z = -3.54, p < 0.001), but a relationship less steep than expected from the mean of all initial estimates (z = 2.23, p = 0.026). This shows that the weight given to outlier estimates was reduced compared to simple arithmetic averaging of all initial estimates but not dismissed entirely in the group consensus decision.

Given the right-skewed distribution of the initial estimates (S1 Fig), a potential mechanism to reduce the effect of outliers within groups would be to use the geometric, rather than arithmetic, mean. It has been demonstrated recently that adults integrate estimates from others using a rule that approximates the geometric mean [23], although it is unknown whether groups (including groups of adults) reaching a consensus will also use this rule. When comparing models of the group consensus estimates as noisy estimates of various cognitively simple aggregation rules (Fig 4, S1 Methods), using the geometric mean to aggregate initial estimates gave the best fit to the data (Fig 4, S4 Fig). Furthermore, despite the strong effect disagreement had on how much groups shifted from the arithmetic mean (Fig 3a), disagreement in initial estimates was only marginally, and not statistically significantly, related to the shift from the geometric mean (Fig 3b, neg. bin. GLM: LRT 1,45 = 3.47, p = 0.062). The close match between the groups’ estimates and the geometric mean of the initial estimates is particularly evident when both are plotted against the range of initial estimates (S4 Fig, neg. bin. GLMM: z = 0.18, p = 0.86).

PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 4. Fits of different aggregation rules to the observed group consensus estimates in Experiment 1. Observed log-likelihoods of eight different rules for aggregating initial estimates (circles) are plotted with the log-likelihoods when the noise (dashes) added to each estimation maximizes the log-likelihood (S4 Fig, S1 Methods). For the median (x 2 ), geometric (geom) and arithmetic (arit) means, and mean of the lowest and highest estimates in each group (x 13 ), the fits to the data are close to maximal. The other rules tested are: the lowest estimate (x 1 ), mean of the lowest and median estimate (x 12 ), mean of median and highest estimate (x 23 ), and highest estimate (x 3 ). The strategies are sorted in the x axis in an order that results in increasing values for many of the groups. https://doi.org/10.1371/journal.pone.0204462.g004

Estimates given by individuals before and after group discussion, when individuals were again free to deviate from the group consensus, were compared to further explore how initial estimates in groups were aggregated. Given the data [24], a model with the log10 initial estimates as an explanatory variable predicting the post-discussion individual estimates (S1 Table) was more likely than one where the initial estimates were untransformed (S5 Fig: neg. bin. GLMM ΔAICc log10 model = 0.0, ΔAICc untransformed model = 9.6). The group interaction thus has a logarithmic-like effect on individual estimates, an effect consistent with an approximate geometric mean aggregation rule, as the logarithm of the geometric mean is the arithmetic mean of the logarithms.

Despite the evidence that the geometric mean provides a good fit to the overall observed consensus estimates (Fig 4), it is feasible that different methods were used to decide collectively, especially because the level of disagreement may influence how group decisions are made [22]. We find that although the geometric rule appeared to be used most frequently, there is still room to consider the use of the other aggregation rules, with each alternative rule (Fig 5a, S1 Methods) being the closest to the group consensus estimate at least twice. The rules we compare are relatively cognitively undemanding and hence we believe they are feasible for the young participants in our study to use, whether consciously, such as selecting the initial estimate in between the smallest and largest (i.e. the median), or intuitively with a heuristic that is more likely to occur with rules such as the arithmetic mean. However, the frequency of using alternative rules, other than using the mean of the lowest and highest estimates, was within the 95% confidence intervals of assuming the geometric mean heuristic was used to aggregate initial estimates in each group, with a level of noise added that matched the noise in estimates in the experiment (Fig 5a, S1 Methods). After splitting the data into groups with a low or high range of pre-discussion initial estimates (Fig 5b), there were several potential rules being applied in groups with low range, although it is more difficult to statistically distinguish between different rules when the range is smaller as, by definition, initial estimates are more similar to one another (e.g. S6 Fig). At high ranges, the geometric mean was clearly the most common strategy (Fig 5c).

PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 5. The use and consequences of different aggregation rules in Experiment 1. (a) Probability of each aggregation rule being the closest to the observed consensus estimates of the groups (filled circles). Also plotted is the probability (mean is black line, and shaded region is 95% confidence intervals) that the aggregation rule is the closest to the observed consensus estimates when a ‘noisy’ geometric mean simulation is instead used to aggregate the initial estimates (S1 Methods). (b) Range and skew in initial estimates for each group. The range of initial estimates is plotted against the relative distribution of the estimates. Skew is ρ = (x 2 −x 1 )/(x 3 −x 1 ), and is close to zero if the highest estimate (x 3 ) is a relative outlier, and close to one if lower estimate (x 1 ) is a relative outlier. The threshold between groups of low and high range (red line) is an approximate point that separates the region with any configuration (≤40) to the region with the two lower estimates being much closer to each other than to the higher (>40). (c) As (a), but separately for groups with a low range of estimates (blue dots, line and shaded area) and high range of estimates (red dots, line and shaded area). (d) Absolute error if each of the strategies had been followed exactly by groups with low range (blue) and high range (red). The notation for strategies as in Fig 4, with a final column added in (d) for the error of the observed group consensus estimates. The threshold used to define groups with low and high ranges did not have any effect on the trends in (c) and (d) (S6 Fig). The box plots show the median (thick black lines), interquartile range (enclosed by the boxes), 1.5 × the interquartile range beyond the boxes (whiskers) and outliers beyond the whiskers (open circles). https://doi.org/10.1371/journal.pone.0204462.g005

We tested the consequences of using different aggregation rules for the accuracy of group decision making. For groups with low range, only the highest estimate and the average of the highest and median estimates significantly outperformed the geometric mean rule (Fig 5d, S2 Table). In contrast, for groups with high ranges, the geometric mean outperformed all alternatives. The rules that outperformed the geometric mean at low group ranges and the rule that was used more than expected compared to the noisy geometric rule at low ranges gave particularly large errors in groups with high ranges (Fig 5d). Thus, the geometric mean provides a robust and generally high performing aggregation rule, particularly when there is disagreement in estimates within a group, and this trend matches its preferential usage (Fig 5c). There was no difference in accuracy between the geometric mean of the initial estimates and the group consensus estimates across all groups, or only in groups with low or high ranges of initial estimates (S2 Table).