Study 1 demonstrated that emotion played an important role in the transmission of moral content. It has long been understood that emotion plays a key role in our moral experience and the evolution of our moral sense (Gintis et al., 2008; Graham et al., 2013; Norenzayan, 2014; Pizarro, 2000). As mentioned previously, previous research in cultural transmission has found an emotional content bias (Eriksson and Coultas, 2014; Heath et al., 2001; Stubbersfield et al., 2017). These studies propose that the mechanism explaining emotional content bias is that more emotional content elicits a greater physiological response, resulting in enhanced selection, recall and transmission, however, their measures of emotional arousal are self-reported ratings of emotional content, rather than any physiological measure of arousal. To our knowledge only one study has directly examined an interaction between physiological arousal and cultural transmission, Berger (2011) found that a physiological excitatory state (induced through physical exercise) increased the sharing of information despite the arousal being incidental to the material being shared.

The present study seeks to replicate and extend on Study 1 by including a measure of physiological arousal: electrodermal activity (EDA). EDA measurement has been extensively used to examine physiological arousal in response to emotional content (Boucsein, 2012). Study 2 also builds on Study 1 by including consequences in the narratives, either a reward for morally good actions or a punishment for morally bad actions. Previous research demonstrates that the consequences of an action play a key role in moral judgement and that moral scenarios involving consequences elicit unique activity in the brain compared to nonmoral scenarios (Schaich Borg et al., 2006). We therefore hypothesise that vignettes which include the consequences of a moral action will be more faithfully transmitted than those which do not. Based on the results of Study 1 and previous studies examining emotion and the transmission of narratives we will examine four hypotheses:

1. There is a cognitive bias for transmitting morally good information. Consistent with Study 1 and research suggesting a bias for transmitting positive information in experimental settings and on social media, morally good information will be preferentially transmitted over non-moral or morally bad information (H4).

2. There is a bias for transmitting more physiologically arousing content. Consistent with previous research suggesting a bias for more emotional content, content which evokes a stronger physiological response (as measured using EDA) will be more faithfully transmitted along a linear chain (H5).

3. There is a bias for transmitting narratives which feature a consequence for a moral action. Stories which present either a reward for a morally good action or punishment for morally bad action will be more faithfully transmitted than those which do not (H6).

4. Self-reported emotion ratings provide an adequate proxy for actual emotional arousal. In order to provide an assessment of previous research which has used self-report measures, we will compare the EDA measures with self-report measures (H7).

Materials and methods

Participants

Thirty-six participants (27 female, 9 male) aged 18 to 49 years (M = 22.61, SD = 5.40) took part. All participants gave their informed consent.

Materials

Vignettes were 30 to 68 words and contained 3 to 10 propositions (determined through propositional analysis (Kintsch, 1974)). As in Study 1, moral and non-moral versions were created. In addition, versions with consequences and without consequences were included. See below for examples (bolded sections illustrate differences between versions and did not appear as such to participants).

Smoothie-Non-moral version

Jackie’s partner read an interview with a famous actor who drank urine for its supposed health benefits. He decided to try it out on himself. He added some urine to his breakfast smoothie. He didn’t see a problem–he hadn’t noticed any difference in the taste. Jackie felt sickened at the thought.

Smoothie-Non-moral version with consequence

Jackie’s partner read an interview with a famous actor who drank urine for its supposed health benefits. He decided to try it out on himself. He added some urine to his breakfast smoothie. He didn’t see a problem–he hadn’t noticed any difference in the taste. Jackie felt sickened at the thought. Jackie and her partner had a big fight about it.

Smoothie-Moral version

Jackie’s partner read an interview with a famous actor who drank urine for its supposed health benefits. He decided to try it out on the kids. He added some urine to their breakfast smoothie. He didn’t see a problem—they hadn’t noticed any difference in the taste. Jackie felt sickened at the thought.

Smoothie-Moral version with consequences

Jackie’s partner read an interview with a famous actor who drank urine for its supposed health benefits. He decided to try it out on the kids. He added some urine to their breakfast smoothie. He didn’t see a problem—they hadn’t noticed any difference in the taste. Jackie felt sickened at the thought. Jackie and her partner had a big fight about it.

As in Study 1, the results of a pre-test were used to determine the eight vignettes most appropriate for analyses, using the same survey as the Study 1 pre-test (see SM1, SM6, and SM7).

EDA equipment and measurement

EDA was measured using a Biopac MP36R system operating AcqKnowledge 4.4 software, sampling at 50 Hz, with a gain level of ×1000 and a high pass filter of .05. EDA responses for each participant were identified as the maximum phasic skin conductance response measured while reading each vignette (no recordings were taken when typing or not reading). To correct for individual differences, response scores were range corrected, that is each score was calculated as a proportion of that participants total EDA range. Settings and measurement were informed by Braithwaite et al. (2013) and pilot studies conducted by the researchers.

Design

As in Study 1, a linear transmission chain design was used. As in previous research, three ‘generations’ were used (Barrett and Nyhoff, 2001; Nielson et al., 2012; Stubbersfield et al., 2015; Stubbersfield et al., 2017) across twelve chains. The first participant in each of the twelve chains received all eight vignettes. Participants received an equal mix of moral, non-moral, moral with consequences and non-moral with consequences.

Procedure

The procedure matched that of Study 1, with the addition of EDA measurement. Participants attached pre-gelled electrodes to their own palms with instruction from the researcher, who also checked they were attached appropriately. The researcher remained in the room to monitor the EDA readout and participant activity for actions which could influence the EDA measurement (i.e., coughing, excessive movement).

Coding

Recalled material was coded for the presence of propositions found in the original version. Coding reliability was assessed by having an independent coder, blind to the hypothesis, code 11% of the material. The independent coder and experimenter were highly consistent (r = 0.92, p < 001). In cases of disagreement the first coder’s decision stood. Sensitivity tests were conducted to assess coder reliability (see below and SM10).

Statistical analysis

To test H4 and H5 analyses followed that of Study 1 using the same software and R packages. A GLMM was constructed to predict the proportion of original propositions correctly recalled, including participant age, participant gender, word count, number of propositions, moral good score, moral bad score, survival information score, social information score, male stereotype consistency score, female stereotype consistency score, emotion, EDA score, and generation as fixed effects, with nested effects of vignette in participant, participant in generation, and generation in vignette set. Predictors were removed if doing so did not impair model fit, determined by AIC. As before, no specific ΔAIC was used to determine predictor removal but all models with ΔAIC < 2 relative to the best-fitting model were included in model averaging (see SM4 for AICs of each model produced). All non-categorical variables were centred on the mean.

To test H6 a GLMM was constructed to predict the proportion of original propositions correctly recalled with moral category (categories being moral-no-consequences (M), moral-consequences (MC), non-moral-no-consequences (N), and non-moral-consequences (NC)) as a fixed effect, with the same random effects as the H4 and H5 GLMM. To test H7 a version of the best fitting model was created in which EDA score was replaced with emotional rating score, and the effect on model fit evaluated.

Results

As in Study 1, a higher rating for morally good information was an important predictor of transmission fidelity: moral good score was retained as an effective predictor in all of the five best-fitting models, as determined by AICc. EDA score was less important as a predictor of transmission fidelity, being retained in four of the five best-fitting models (see Table 2). The moral proposition appearing in the original material was recalled in the majority cases (77.08%), and the moral proposition survived to the end of the majority of chains where it was present in the original material (58.33%). Morally good score was also a better predictor of recall for the key proposition than morally bad score (X2 = 0.59, p < .001).

Table 2 Fixed effects, AICc and ΔAIC of the five best fitting models (ΔAIC < 2) produced in the analysis Full size table

Vignette emotion and participant gender were also important predictors of transmission fidelity. Figure 4 shows the odds ratios for the fixed effects of the best fitting model in Study 2 (Model 1 in Table 2). (For details of other models, see SM8).

Fig. 4 Odds ratios with confidence intervals of fixed effects predicting transmission fidelity in the best fitting model in Study 2 (Model 1 in Table 2). Odds ratios are sorted from highest to lowest, with the highest at top. Values to the right of the dashed line indicate a positive effect, values to the left of the line indicate a negative effect. Produced using sjPlot (Lüdecke, 2016). *p < 0.05, **p < 0.01, ***p < 0.001. Reference categories are: anger for emotion, generation 1 for generation, and female for gender Full size image

Pairwise comparisons of vignettes featuring different emotions showed that disgust vignettes were transmitted more faithfully than vignettes featuring either gratitude (Tukey’s HSD corrected z = −3.43, p < 0.005) or elevation (z = −3.13, p < 0.01). No other significant differences between emotions were found (zs = −0.57 to 1.28, ps > 0.05). (See SM5).

A multi-model averaging approach (see Study 1) was used to determine appropriate effect estimates (see Fig. 5). Morally good score had a positive effect on transmission fidelity (estimate = 0.38 ± 0.11 SE, z = 3.56) while EDA score had a less consistent but negative effect on transmission fidelity (estimate = −0.70 ± 0.66 SE, z = 1.06). Participant gender had an effect, with women recalling more propositions, on average, than other participants (estimate = −1.13 ± 0.33 SE, z = 3.38). Vignette word count had a small negative effect (estimate = −0.03 ± 0.03 SE, z = 1.12). Higher female stereotype consistency score had a small positive effect on transmission fidelity (estimate = 0.16 ± 0.29 SE, z = 0.54), while higher male stereotype consistency score had a small negative effect on transmission fidelity (estimate = −0.03 ± 0.12 SE, z = 0.27). Relative variable importance measure (see Fig. 5) suggest that, of the predictors, good score, emotion and gender were the most important in determining recall. Each of these variables was also retained in the best-fitting models from Study 1 with different participants and material. The effect of good score was again present in all generations, as it was in Study 1 (see Fig. 6).

Fig. 5 Predictor effect size indicated by z value and relative variable importance (maximum value = 1) from the average model based on the five best fitting models for Study 2. See SM8 for a more complete report of model-averaged coefficients. aIndicates a categorical variable where mean z-value is presented Full size image

Fig. 6 Predicted probabilities for the effect of good score (from mean) on recall by generation, derived from the results of the best-fitting model in Study 2 (Model 1 in Table 2). Produced using sjPlot (Lüdecke, 2016) Full size image

To test H7 a version of the best fitting model was created using emotional rating score in place of EDA score. In this model emotional rating also had a negative effect on transmission fidelity (estimate = −0.57 ± 0.37 SE, z = −1.52). When these two models were compared, the model with emotional rating score proved to be a better fit to the data than the model using EDA score (X2 = 0.14, p < 0.001), however, the difference in model fit is small (ΔAIC < 2).

Consequences (Moral and Nonmoral vs Moral with Consequences and Nonmoral with Consequences) is a significant predictor of transmission fidelity (vs. generation-only model, X2 3 = 10.31, p = .02), however, vignettes without consequences (moral and non-moral) were transmitted more faithfully than their no-consequences equivalents (estimate = 0.55, SE = 0.26, z = 2.15, p < 0.05). Multiple pair-wise comparisons show that this primarily driven by the difference between Moral and Nonmoral with Consequences vignettes (z = −3.04, p < 0.05), no other significant differences between vignette types were found (zs = −2.18 to −0.00, ps > 0.05).

Sensitivity tests

Sensitivity tests were conducted using data from the second coder to assess the robustness of results based on data from the original coder. As in the original results, morally good information was an important predictor, being retained in all the best fitting models, while EDA Score was not (ΔAIC < 2). In addition, morally good score again had a positive effect on transmission fidelity (model average estimate = 0.36 ± 0.10 SE, z = 3.81) while EDA score had a less consistent but negative effect on transmission fidelity (model average estimate = −0.24 ± 0.50 SE, z = 0.48), suggesting these findings are also robust. Consequences were not a significant predictor of transmission (vs generation only model, X2 3 = 6.63, p = .08). Emotional rating score proved to be a better fit to the data than the model using EDA score (X2 = 5.38, p < 0.001) but both were negative predictors of transmission fidelity. For more details on the results of the sensitivity tests see SM10.

Discussion

The aim of this study was to test four hypotheses regarding a transmission bias for morally good content and the influence of physiological arousal on transmission. Consistent with the findings of Study 1, we found evidence to support H4 (there is a cognitive bias for transmitting morally good information). As in Study 1, this was true across all ‘generations’ (See Fig. 6). We found no evidence to support H5 (there is a bias for transmitting more physiologically arousing content), in fact the opposite was found. We also found no evidence to support H6 (there is a bias for transmitting narratives which feature a consequence for a moral action). We did, however, find evidence in support of H7 (self-reported emotion ratings provide an adequate proxy for actual emotional arousal). Self-reported emotion ratings negatively predict recall, as EDA measures do, suggesting that it is appropriate to use self-report measures for emotional content in future research.

Previous research examining an emotional bias in cultural transmission has found that emotive information (as determined through self-report measures) has a positive influence on transmission fidelity (Eriksson and Coultas, 2014; Heath et al., 2001; Stubbersfield et al., 2017). The explanation given for this finding is that emotive information is physiologically arousing, which makes it more memorable and more likely to be selected for transmission. Our finding that physiological arousal (as measured through EDA) had a negative effect on transmission fidelity is inconsistent with this proposed mechanism and therefore with this previous research. To examine this finding further, in the context of our study, another GLMM was constructed predicting EDA score rather than recall, but otherwise the same as those models constructed to predict recall. A full model found that morally good content had a negative effect on EDA score (estimate = −0.36 ± 0.21 SE, z = 0.08), suggesting the vignettes with a higher morally good score were less physiologically arousing than other vignettes.

We therefore propose that the role of emotion and physiological arousal in cultural transmission is not a direct one, but one that increases transmission by making content more salient when memorability is the primary determinant of transmission success. Clark and Kashima (2007) demonstrated that participants’ knowledge of their recall being transmitted to another person in a transmission chain produced different results from chains where they were not aware. They argued that participants’ awareness of transmission led to communicative intent, producing a different result relative to the recall-only chains. In our experiment, transmission to another participant was also known, likely leading to a combination of recall and communicative intent where memorability (and hence arousal) may not have been the most influential factor. We suggest that less arousing content did not have a transmission advantage in our study because it was less arousing, but rather because the content bias for morally good content was a more important determinant of transmission success than memorability. This proposal is supported by van Leeuwen et al. (2018), who found that participants more frequently chose to transmit positive, low-arousal vignettes over negative, high arousal ones when transmission was to strangers. Further research is required to examine fully the role of arousal, memory, communicative intent and audience perception in the transmission of moral content and suggestions for future research are discussed in the general discussion.

We found no support for the hypothesis that vignettes which featured a consequence for moral actions would be more faithfully transmitted. The results instead suggested that vignettes with consequences were less faithfully transmitted than those without consequences. However, pairwise comparison suggests that this finding was driven by the differences in transmission fidelity between Moral vignettes and Nonmoral with Consequences vignettes. A key finding in studies examining the cultural transmission of narratives has been the process of sense-making and stories being altered to fit the transmitters’ schema (Bartlett, 1932; Lyons and Kashima, 2006), therefore, that Nonmoral with Consequences vignettes were not faithfully transmitted may be better explained by the apparent incongruity between the non-moral, unintentional action having some form of punishment or reward as a consequence of it, than stories without consequences having some form of advantage in transmission. An appropriate interpretation of this result, therefore, is that in this case the inclusion of consequences had no clear direct effect, positive or negative, on the transmission of moral information.

We found that vignettes featuring female-stereotypical content had high transmission, supporting Lyons and Kashima’s (2006) findings. We found, however, that while female stereotype consistency had a positive effect on transmission fidelity, male stereotype consistency did not. This finding is inconsistent with the finding of Study 1. It is again worth noting that, as in Study 1, the effects of gender stereotypes were less important than other predictors in terms of predicting transmission fidelity. It is possible, therefore, that both studies give valid but noisy estimates of the same, small, ‘true’ effect of stereotype consistency. We also found that vignettes featuring disgust had higher transmission fidelity than a number of other emotions, which is consistent with research demonstrating an advantage for disgusting content in transmission (e.g., Eriksson and Coultas, 2014). Unlike Study 1, word count had a negative effect on transmission fidelity, with longer vignettes being more poorly transmitted. However, in Study 2, word count was confounded with the presence of consequences. This is likely to explain the inconsistency between Study 1 and Study 2 with regard to the effects of word count. Because all but nine of our participants were female, gender was not a priori a variable of interest, and there is little existing literature on gender and social transmission in adults, we refrain from drawing conclusions as to the influence of gender. To assess the reliability of the results, sensitivity tests were again conducted (see results section and SM10), and again found that the key findings are robust.