People learn differently from good and bad outcomes. We argue that valence-dependent learning asymmetries are partly driven by beliefs about the causal structure of the environment. If hidden causes can intervene to generate bad (or good) outcomes, then a rational observer will assign blame (or credit) to these hidden causes, rather than to the stable outcome distribution. Thus, a rational observer should learn less from bad outcomes when they are likely to have been generated by a hidden cause, and this pattern should reverse when hidden causes are likely to generate good outcomes. To test this hypothesis, we conducted two experiments ( N = 80, N = 255) in which we explicitly manipulated the behavior of hidden agents. This gave rise to both kinds of learning asymmetries in the same paradigm, as predicted by a novel Bayesian model. These results provide a mechanistic framework for understanding how causal attributions contribute to biased learning.

People are motivated to maximize rewards and minimize punishments, but when updating their beliefs, they often weigh good and bad news differently. The nature of this differential weighting remains puzzling. In some cases, animals and humans attend more to bad events and learn more rapidly from punishments than from rewards (Baumeister, Bratslavsky, Finkenauer, & Vohs, 2001; Taylor, 1991). Similarly, some studies of reinforcement learning have found that learning rates are higher for negative than for positive prediction errors (Christakou et al., 2013; Gershman, 2015a; Niv, Edlund, Dayan, & O’Doherty, 2012). However, other work has demonstrated the opposite pattern of results—greater learning for positive outcomes, not only in reinforcement-learning tasks (Kuzmanovic, Jefferson, & Vogeley, 2016; Lefebvre, Lebreton, Meyniel, Bourgeois-Gironde, & Palminteri, 2017; Moutsiana, Charpentier, Garrett, Cohen, & Sharot, 2015), but also in procedural-learning (Wachter, Lungu, Liu, Willingham, & Ashe, 2009) and declarative-learning tasks (Eil & Rao, 2011; Sharot, Korn, & Dolan, 2011).

Here, we explored the hypothesis that the direction of valence-dependent learning asymmetries depends on beliefs about the causal structure of the environment. To provide some insight, we borrow an example from Abramson, Seligman, and Teasdale (1978): Consider a group of researchers who receive a rejection for a manuscript submission. The researchers’ inferences about the cause of that feedback will influence whether they modify the paper or appeal the decision. If the researchers believe that their submission was rejected because the paper was bad, they will revise the paper and take this new information into consideration for future submissions. However, if they believe that the rejection was due to the opinion of an unfair reviewer, they will be less likely to update their beliefs about the quality of the paper. In other words, they will explain away the rejection, attributing it to a hidden cause (the reviewer’s caustic temperament) rather than to their own ability.

Abramson et al. (1978) argued that “failure means more than merely the occurrence of a bad outcome” (p. 55). Rather, attribution of negative outcomes to oneself is what constitutes failure. According to learned-helplessness theory, individuals with an optimistic explanatory style tend to attribute negative events to external forces, whereas those with a pessimistic explanatory style believe that the causes of negative events are internal. Given this view, optimistic and pessimistic cognitive biases might arise from both (a) differing experiences of reinforcements and (b) beliefs about the causes of those reinforcements. In other words, both the availability of rewards and punishments in the environment and the degree to which these consequences are attributed to oneself determine to what extent positive and negative outcomes influence learning.

Valence-dependent learning asymmetries are important because they may give rise to systematic biases with real-world consequences. On the one hand, learning more from positive outcomes can give rise to unrealistic optimism (Sharot et al., 2011) and risk-seeking behavior (Niv et al., 2012). On the other hand, learning more from negative outcomes can lead to unrealistic pessimism (Maier & Seligman, 1976) and risk aversion (Smoski et al., 2008). Thus, understanding the determinants of these asymmetries may provide insights into a wide range of behavioral phenomena and provide necessary information to curtail their harmful consequences.

One limitation of many past studies examining valence-dependent learning asymmetries is that they do not directly measure or control participants’ beliefs about causal structure, and hence they are not ideal for testing our hypothesis. In the present research, we conducted a more direct test by manipulating the causal structure of a reinforcement-learning task to induce both positively biased and negatively biased learning asymmetries in the same participants. Participants were asked to choose between two options with unknown reward probabilities and were informed that an agent could silently intervene to change the outcome positively (benevolent condition), negatively (adversarial condition), or randomly (neutral condition). Relying on a Bayesian model of causal inference, we expected that participants in the benevolent condition would update their beliefs about the reward probabilities more from negative than positive outcomes. We expected them to do so because negative outcomes could not have been caused by an interfering external agent but instead must have been a result of participants’ enacted choice (i.e., sampled from the option’s reward distribution). Likewise, we expected that participants would learn more from positive compared with negative outcomes in the adversarial condition. To examine the robustness and flexibility of our model, we explored a more realistic scenario in Experiment 2, in which the probability of latent agent intervention was unknown to participants.

Experiment 2 One of the broader questions motivating this research is how the environment shapes learning-rate asymmetries. We addressed this question in Experiment 2 by creating a subtle ambiguity in our experimental task: Instead of informing participants of the exact intervention probability, we simply told them that hidden agents occasionally intervene. We reasoned that this ambiguity more directly reflects real life; in the real world, probabilities for interventions and outcomes are often unknown, and decisions are dependent on one’s prior expectations. Method Participants Two groups of participants (total N = 299) were recruited from Amazon Mechanical Turk—Sample A: n = 110, 49 female, 56 male, 5 unreported; Sample B: n = 194, 90 female, 96 male, 8 unreported. Sample B was collected as part of a preregistered replication, though for the purposes of these analyses we have aggregated the two samples. (See the Supplemental Material for further information on the preregistered replication. Registration details can be found at https://osf.io/cx4u9/ on the Open Science Framework.) Participants were excluded from model fitting if they did not choose the stimulus with the higher reward probability for over 60% of trials; of all participants, 85.3% met the accuracy criterion (86.4% of Sample A, 84.7% of Sample B). Participants were also excluded if they did not properly respond to an attention-check question (n = 6). We included data from 255 participants in the model fits (n = 95 for Sample A, n = 160 for Sample B). Participants gave informed consent, and the Harvard University Committee on the Use of Human Subjects approved the experiment. Procedure Behavioral-task procedures were identical to those in Experiment 1, except that participants were told that the hidden agents would intervene “sometimes.” Actual intervention remained fixed at 30% (15 of 50 trials per block). Computational model Because participants were not told the intervention probability, we explored models that either estimated the probability directly (the adaptive Bayesian model) or treated it as a free parameter (the fixed Bayesian model). In addition, we fitted a model in which the intervention probability was derived empirically by averaging the binary intervention judgments. We refer to this model as the empirical Bayesian model. Results Behavioral analyses Results of Experiment 2 replicated those of Experiment 1: Participants believed that the hidden agent caused negative outcomes more often than positive outcomes across all conditions, t(254) = 6.26, p < .0001, d = −0.06, 95% CI = [−0.18, 0.06], and there was a significant difference between belief in the hidden agent for good outcomes in the benevolent and neutral conditions and for bad outcomes in the adversarial and neutral conditions, t(254) = 7.03, p < .0001, d = 0.29, 95% CI = [0.17, 0.41]. Computational modeling Model comparison overwhelmingly (PXP > 0.999) supported the empirical Bayesian model (in which the intervention probability was derived from the binary intervention judgments) compared with a more sophisticated adaptive Bayesian model (which estimated the intervention probability from experience) and a fixed Bayesian model (which treated the intervention probability as a free parameter). Once again, we found that participants had significantly higher learning rates for positive outcomes than for negative outcomes, t(254) = 4.73, p < .0001, d = −0.82, 95% CI = [−0.95, −0.69]. In a further replication of our results from Experiment 1, we also found that learning rates were significantly lower for trials in which participants believed that the hidden agent intervened, compared with trials in which they believed that the hidden agent did not intervene, t(252) = 16.77, p < .0001, d = −0.80, 95% CI = [−0.93, −0.67] (Fig. 4b). A signed-ranks test between the participants’ actual guess about intervention and the intervention predicted by the model showed a significant median point-biserial correlation (r pb = .55, p < .0001), demonstrating that intervention judgments can be accurately predicted by the adaptive Bayesian model. Discussion By modifying the behavioral task and the computational model to include an unknown probability of hidden-agent intervention, we were able to gain insight into individual differences in prior expectations that govern valence-dependent learning asymmetries. First, a version of the Bayesian model that derived the intervention probability from the average of participants’ binary judgments was favored by model selection among the models we considered. We conjecture that our task taps into prior expectations about the nature and frequency of hidden agents, possibly formed over a lifetime of learning.

Conclusion In sum, we provide evidence that valence-dependent learning asymmetries arise from causal inference over hidden agents. This idea, formalized in a simple Bayesian model, was able to quantitatively and qualitatively account for both choices and intervention judgments. An important task for future researchers will be to understand the limits of this framework: to what extent can we understand self-serving biases, learned helplessness, and other related behavioral phenomena in terms of a common computational mechanism? More generally, the real world is typically less well behaved than the idealized experimental scenarios studied in the present research; people constantly face causally complex and ambiguous inferential problems, where simple attributions to “good” and “bad” hidden agents may not be applicable. We foresee an exciting challenge in extending the Bayesian framework to tackle these more realistic settings.

Action Editor

Marc J. Buehner served as action editor for this article. Author Contributions

All authors developed the study concept. H. M. Dorfman conducted the experiments. S. J. Gershman, H. M. Dorfman, and R. Bhui performed the data analyses and computational modeling. All authors contributed to the writing of the manuscript. ORCID iD

Hayley M. Dorfman https://orcid.org/0000-0001-9865-8158 Declaration of Conflicting Interests

The author(s) declared that there were no conflicts of interest with respect to the authorship or the publication of this article. Funding

H. M. Dorfman was supported by the Sackler Scholars Programme in Psychobiology. S. J. Gershman was supported by the National Institutes of Health (CRCNS R01-1207833) and Office of Naval Research (N00014-17-1-2984). R. Bhui was supported by the Harvard Mind Brain Behavior Initiative. Supplemental Material

Additional supporting information can be found at http://journals.sagepub.com/doi/suppl/10.1177/0956797619828724 Open Practices

All data and materials have been made publicly available via the Open Science Framework and can be accessed at https://osf.io/3htpj/. The design and analysis plans for Sample B in Experiment 2 were preregistered at https://osf.io/cx4u9/. The complete Open Practices Disclosure for this article can be found at http://journals.sagepub.com/doi/suppl/10.1177/0956797619828724. This article has received badges for Open Data, Open Materials, and Preregistration. More information about the Open Practices badges can be found at http://www.psychologicalscience.org/publications/badges.