Abstract To make good judgments people gather information. An important problem an agent needs to solve is when to continue sampling data and when to stop gathering evidence. We examine whether and how the desire to hold a certain belief influences the amount of information participants require to form that belief. Participants completed a sequential sampling task in which they were incentivized to accurately judge whether they were in a desirable state, which was associated with greater rewards than losses, or an undesirable state, which was associated with greater losses than rewards. While one state was better than the other, participants had no control over which they were in, and to maximize rewards they had to maximize accuracy. Results show that participants’ judgments were biased towards believing they were in the desirable state. They required a smaller proportion of supporting evidence to reach that conclusion and ceased gathering samples earlier when reaching the desirable conclusion. The findings were replicated in an additional sample of participants. To examine how this behavior was generated we modeled the data using a drift-diffusion model. This enabled us to assess two potential mechanisms which could be underlying the behavior: (i) a valence-dependent response bias and/or (ii) a valence-dependent process bias. We found that a valence-dependent model, with both a response bias and a process bias, fit the data better than a range of other alternatives, including valence-independent models and models with only a response or process bias. Moreover, the valence-dependent model provided better out-of-sample prediction accuracy than the valence-independent model. Our results provide an account for how the motivation to hold a certain belief decreases the need for supporting evidence. The findings also highlight the advantage of incorporating valence into evidence accumulation models to better explain and predict behavior.

Author summary People tend to gather information before making judgments. As information is often unlimited a decision has to be made as to when the data is sufficient to reach a conclusion. Here, we show that the decision to stop gathering data is influenced by whether the data points towards the desired conclusion. Importantly, we characterize the factors that generate this behaviour using a valence-dependent evidence accumulation model. In a sequential sampling task participants sampled less evidence before reaching a desirable than undesirable conclusion. Despite being incentivized for accuracy, participants’judgments were biased towards believing they were in a desirable state. Fitting the data to an evidence accumulation model revealed this behavior was due both to the starting point and rate of evidence accumulation being biased towards desirable beliefs. Our results show that evidence accumulation is altered by what people want to believe and provide an account for how this modulation is generated.

Citation: Gesiarz F, Cahill D, Sharot T (2019) Evidence accumulation is biased by motivation: A computational account. PLoS Comput Biol 15(6): e1007089. https://doi.org/10.1371/journal.pcbi.1007089 Editor: Ross Otto, McGill, CANADA Received: July 25, 2018; Accepted: May 10, 2019; Published: June 27, 2019 Copyright: © 2019 Gesiarz et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: All files are available from the Github https://github.com/affective-brain-lab/Gesiarz_Evidence_Motivation. Funding: Funded by a Wellcome Trust Fellowship 214268/Z/18/Z to TS. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: Authors FG and TS have no conflict of interests. DC declares to have been employed by Google LLC, Mountain View, at the time of preparing the article for publication.

Introduction Judgments are formed over time as information is accumulated [1–3]. When given an opportunity to sample unlimited data an individual can decide to continue gathering evidence until a certain threshold is reached [4,5]. This decision involves the trade-off between time and accuracy–an exchange that has been well-studied [6–8]. It seems probable, however, that the decision to stop gathering evidence would also be influenced by the desire to hold one belief over another [9, 10]. For example, people are less likely to seek a second medical opinion when the first physician delivers good news than when she delivered bad news [11]. The problem with such observations is that they often confound desirability with probability–a patient might seek a second opinion after receiving a dire diagnosis simply because the diagnosis is rare (and thus seems unlikely), not because it is undesirable. Here, we set out to empirically examine in a controlled laboratory setting whether and how the desire to hold a belief influences the amount of information required to reach it, when all else is held equal. Presently, we have limited understanding if and how motivation alters evidence accumulation, despite the potential for such effects to dramatically impact people’s decisions in domains ranging from finance to politics and health [9–11]. To gain insight into the underlying process we tease apart the computational elements that may be influenced by motivation. Specifically, we hypothesized that the desire to hold one judgment over another could alter information accumulation in at least two ways. First, people may be predisposed towards desired judgments before observing any evidence at all (for example, one may believe it will be a nice day before checking the weather or glancing outside) [12]. A second, not mutually exclusive possibility is that a desirable piece of evidence (e.g., a ray of sunlight) drives beliefs towards a desirable judgment (‘it will be a nice day’), more so than an undesirable piece of evidence (e.g., the sound of rain) towards an undesirable judgment (‘it will be a grey day’) [13]. These two distinct mechanisms will result in the same observable behavior. In particular, less information will be gathered to support desirable judgments than undesirable, such that the former would be reached faster. To dissociate these mechanisms, we use a computational approach. We adopt a sequential sampling model to model noisy evidence accumulation towards either of two decision thresholds [1,14,15]. The model allows estimating both (i) the starting point and (ii) rate of evidence accumulation, reflecting the quality of information processing [14]. This enables us to ask if either of these factors, or both, are influenced by motivation. In our task participants witness various events that are contingent upon which one of two hidden states they are in. One state was associated with greater rewards than losses (desirable state) and the other with greater losses than rewards (undesirable state). The participants had no control over which state they were in; their task was simply to judge the state, gaining additional rewards for accurate judgments and losing rewards for inaccurate judgments. Thus, it is in participants’ best interest to be as accurate as possible and they were allowed to accumulate as much evidence as they wish before making a judgment. We examine whether and how the accumulation process is sensitive to participants’ motivation to believe that they are in one state and not the other.

Discussion The findings show that motivation has a profound effect on the process by which evidence is accumulated. On trials in which participants indicated they believed the state was desirable, they ceased gathering data earlier and required a smaller proportion of samples to be consistent with that conclusion. We used a computational model to characterize the underlying factors that may generate this behavior. The model revealed two factors; first, participants began the process of evidence accumulation with a biased starting point towards the desired belief. Thus, they required less evidence to reach that boundary. Second, the drift rate–the rate of information accumulation [14]–was greater on trials in which participants were in the desirable state than the undesirable state. If only a bias starting point was observed, this would have indicated that people might make fast errors, but with time/evidence would have corrected their initial biases. The existence of a process bias, however, makes correction more difficult. While participants incorporate both desirable and undesirable evidence into judgments, the larger weight assigned to desirable evidence means that biases could increase over time with more evidence accumulation. These results indicate that the temporal evolution of beliefs is influenced by what people wish to be true and that evidence accumulation is valence dependent. That is, the rules of accumulation depend on whether the data is favorable or unfavorable. Most learning models [17–19] assume that agents learn from information they encounter, but that the learning process itself is not influenced by whether the evidence supports a desired or undesired conclusion. This study suggests this assumption is likely false. By allowing the parameters of a standard evidence accumulation model to vary as a function of the desirability of the evidence we were able to better explain and predict participants’ behavior. We chose to model the data with a drift-diffusion model because its components mapped onto the two alternatives of desirability bias in judgment. These components have been increasingly validated through targeted manipulations [20] and associated with specific neural and physiological correlates [21–25]. The good fit of the model to our data, as well as the alignment of the model results with the behavioral analyses vindicates the choice. We speculate that incorporating valence into other classes of learning models will also increase their predictive accuracy. Our findings are in accord with previous suggestions that people hold positively biased priors [12] and update their beliefs more in response to good than bad news [13,26–29]. We speculate that biased evidence accumulation could be due to biases in perception [30, 31], attention [32, 33] and/or working memory [34, 35]. For example, participants may have attended to desirable stimulus to a greater extent than the undesirable stimulus, such that the former were assigned greater weight when forming beliefs. Such stimulus could also be maintained in working memory longer. These biases are thought to be automatic and do not require large cognitive resources [31, 36]. Here, we show such biases manifest into differential patterns of evidence sampling and accumulation. Our results also support a previous demonstration that people need less evidence to reach desirable conclusions in the domains of health and social interaction [9]. We go further in evidencing this in a situation where (i) participants are incentivized for accuracy, (ii) the desirable and undesirable conditions differ only on desirability and (iii) we provide insight to the underlying computations. In sum, the current study describes how the motivation to hold a certain belief over another can decrease the need for supporting evidence. The implication is that people may be quick to respond to signs of prosperity (such as rising financial markets)–forming desirable beliefs even when evidence is relatively weak- but slow to respond to indictors of decline (such as political instability)–forming undesirable beliefs only when negative evidence can no longer be discarded. Indeed, in our study participants were more likely to hold positive false beliefs (falsely believing they are in the desirable factory when in fact they were in the undesirable factory) than negative false beliefs (falsely believing they are in the undesirable factory when in fact they were in the desirable factory). While both positive and negative false beliefs resulted in a material cost, we speculate that positive false beliefs may have non-monetary benefits. In particular, it has been hypothesized that beliefs, just like material goods and services, have utility in and of themselves [30–36]. In certain circumstances it is possible that the increase in utility from false beliefs themselves may be greater than the material utility lost, resulting in net benefit.

Methods Participants We recruited 100 participants (M age = 34.48, 44% female) from Amazon Mechanical Turk (www.mturk.com). To qualify for participation, participants had to be resident in the United States. Participants were paid $4.5 for their participation and were promised an unspecified performance related bonus for a task that was expected to take 30 minutes. The study was approved by the ethics committee at University College London. Informed written consent was gained from participants. Procedure Factory game task. Participants played 80 trials of the “Factory Game”. They began each trial by pressing the space bar, after which they witnessed an animated sequence of televisions and telephones passing along a conveyor belt. Each object would take 400 ms to traverse the belt with a 150 ms lag between stimuli. There were two types of trials: Telephone Factory trials and Television Factory trials. In telephone factory trials the probability of each item in the animated sequence being a telephone was 0.6. and of being a television 0.4. For Television Factory trials the proportion was reversed. The current trial type was randomly determined with replacement on every trial with an equal probability for each trial type. Participants were tasked with judging whether they were in a Telephone Factory trial or whether they were in a Television Factory trial. Since the trial type was not directly observable, their means of doing this was through reverse inference over the sequence of objects they were seeing. Participants were free to respond as soon as they wished after initiating the trial and the sequence would continue until they made their choice. Participants began the game with an endowment of 5000 points. Each 100 points was worth 1 cent. One of the two factory types was randomly assigned per participant to be the desirable factory type and the other to be an undesirable type. Participants were informed that each time they visited the desirable factory, they would win an unspecified number of points, and each time they visited the undesirable factory, they would lose an unspecified number of points. Crucially, this bonus was entirely outside of the participant’s control, i.e. it was not affected by the judgments the participant made. Separately, participants were informed that they would earn an unspecified number of points for making a correct judgment and lose an unspecified number of points for making an incorrect judgment. The magnitude of each unspecified bonus/loss are independent of each other, potentially unequal and vary randomly on each trial. We dropped trials where the participant made their judgment before seeing a second item. In cases where a participant did this in over half their trials, we assumed that participant was not appropriately engaging with the task and eliminated the entirety of their trials. We dropped 10 participants for this reason, as well as a further 123 responses made before seeing second item. We additionally excluded 3 participants whose average accuracy in the task was two standard deviations below the mean of the sample (i.e. for whom accuracy was below 53.28%; mean accuracy of the sample was 71.24%), assuming that these participants were guessing rather than providing their answers based on presented evidence. Finally, 3 participants were excluded as possible bots. These included "participants" who had at least two of the following indicators: nonsense answers to open-ended questions and/or IPs originating outside of the region targeted by Mturk and/or reaction times at regular intervals (i.e. button presses at exactly the same millisecond after the start of the trial) in more than 10% of trials and/or comprehension questions at chance level. After the above exclusions, we performed the analysis on 84 participants, and a total of 6597 trials. The same exclusion criteria are applied in the replication and control studies. Training. Participants received extensive instructions prior to playing the game, and were required to answer multiple choice comprehension check questions on the key points of the task, with the question repeated until they either chose correctly or reached three times, upon which the correct answer was displayed to them. The comprehension check questions addressed the following key points of how the game worked: that telephone factories mostly produced telephones, but sometimes produced televisions; investment bonus was independent of the judgments they made; which factory was their desirable factory; and that trial types were randomly determined and it was not guaranteed that they would see exactly the same amount of each type of factory. Participants then played a practice session of 20 trials, where the trial type was visibly displayed to them, so they could have prior experience of the outcome contingencies and the trial type distribution. Data analysis Psychometric function. To relate participants’ judgments to the strength of evidence they observed we fitted a psychometric function, using a generalized mixed effects equivalent of a logistic regression, with fixed and random effects for all independent variables. We fitted these functions separately for participants for whom TV factory was desirable and for whom TV factory was undesirable. Where P(TV) is the probability of a participant indicating they are in a TV factory; X is the proportion of TV stimuli out of all stimuli observed on a trial. This variable was centred, thus ranging from 0.5 when all samples were TVs to -0.5 when all samples were phones; β 0 is the indifference point–reflecting the proportion of TVs required to respond TV 50% of the time. If β 0 = 0, participants would indicate they are in a TV factory half the time when half the samples were TVs. When β 0 is low the function will move left and vice versa; β 1 is the slope, reflecting by how much the probability of a participant indicating they are in a TV factory increases when the proportion of TVs increases by one unit. RT and number of samples. As stimuli were presented at a steady pace, the number of samples drawn was highly correlated with reaction times (R = 0.99, p < 0.00001) and thus these two measures can be thought of as interchangeable. As the number of samples drawn before making a judgment was non-normally distributed and had a heavy positive skew, we log-transformed this variable [37]. Speed-accuracy trade-off. To examine speed-accuracy trade-off we divided the trials into fast and slow, based on median reaction time of the participant, and then calculated the average accuracy of desirable and undesirable responses within these categories. We performed a 2x2 ANOVA, with average accuracy as a dependent variable, and response (desirable/undesirable) and speed (fast/slow) as independent factors. Drift-diffusion modelling. Our aim in modeling our task using the drift-diffusion framework was to assess the contribution of both the starting point and drift rate to the desirability bias we saw in our data. To that end, we implemented and compared six different specifications of a drift-diffusion model (DDM; see Table 2). PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Table 2. Variants of drift-diffusion model. https://doi.org/10.1371/journal.pcbi.1007089.t002 In particular, in models with valence-independent starting point its value was fixed at 0.5. In models with valence-dependent staring point, its value could vary between 0 and 1. In models with an unbiased drift rate the parameter was symmetric for desirable and undesirable factories (v and -v). In models with biased drift rate the model additionally included a term reflecting the difference between drift rates for desirable and undesirable factories (β 3 factory desirability). “Factory desirability”—is the true factory visited coded as 1 for desirable factories and 0 for undesirable factories. Moreover, following an approach used previously [18, 19], in all cases the drift rate was allowed to vary on each trial as a function of the proportion of samples observed that are consistent with the true state (β 1 evidence). This variable was centred, ranging from 0.5 when all samples were consistent with the true state to -0.5 when all samples were inconsistent with the true state. All models also included parameters for the decision threshold (α) and non-decision time (t0). β 0 is a constant. β 1 is the weight by which the evidence alters the drift rate. β 2 is a bias term reflecting an additional weight added to the drift rate as a function of the factory desirability. Positive values indicated a bias towards desirable judgements, and negative values indicated a bias towards undesirable judgements. β 3 is the weight put on the interaction term, allowing the evidence to alter the drift rate differently in desirable and undesirable factories. We used the HDDM software toolbox [38] to estimate the parameters of our models. The HDDM package employs hierarchical Bayesian parameter estimation, using Markov chain Monte Carlo (MCMC) methods to sample the posterior probability density distributions for the estimated parameter values. We estimated both group-level parameters as well as parameters for each individual participant. Parameters for individual participants were assumed to be randomly drawn from a group-level distribution. Participants’ parameters both contributed to and were constrained by the estimates of group-level parameters. In fitting the models, we used priors that assigned equal probability to all possible values of the parameters. Also, since our “error” RT distribution included relatively fast errors we included an inter-trial starting point parameter (sz) for both models to improve model fit [39]. We sampled 20000 times from the posteriors, discarding the first 5000 as burn in. MCMC are guaranteed to reliably approximate the target posterior density as the number of samples approaches infinity. To test if the MCMC converged within the allotted time, we used Gelman-Rubin statistic on 5 iterations of our sampling procedure. The Gelman–Rubin diagnostic evaluates MCMC convergence by analyzing the difference between multiple Markov chains. The convergence is assessed by comparing the estimated between-chains and within-chain variances for each model parameter. In each case, the Gelman-Rubin statistic was close to one (<1.1), suggesting that MCMC were able to converge. To assess if the parameters describing the bias in prior and drift rate are significantly different from a valence-independent specification of the model, we compared 95% confidence intervals of the parameters’ values against the theoretically unbiased values. In addition, model fits were compared using the Deviance information criterion, which is a generalization of the Akaike Information Criterion (AIC) for hierarchical models. The DIC is commonly used when the posterior distributions of the models have been obtained by Markov chain Monte Carlo (MCMC) simulation. It allows one to assess the goodness of fit, while penalizing for model complexity [40]. Cross-validation. To further validate the model and check its predictive accuracy, we fitted again the valence dependent and valence independent models using data from only even trials. We then used the parameter estimates to predict log RTs, judgments and their accuracy for odd trials for each participant. The simulation was repeated 1000 times with normally distributed random noise added to the drift rate averaging predicted responses for each trial. We then calculated mean absolute error between predicted and observed responses (RTs, judgments and judgment accuracy). We compared the average mean absolute errors between the models using a paired t-test. We also fitted a psychometric function to the simulated data. Collapsing boundaries. Decision boundaries may collapse over time rather than remain fixed, reflecting increasing impatience or urgency of decisions [41, 42]. To investigate if such a model fits our data we fitted a pure diffusion model with a fixed decision threshold and a diffusion model with a collapsing boundary, modeled as a Weibull cumulative distribution function [41]: Where u t is a threshold at time t, a is the initial value of the boundary, a' is the asymptotic value of the boundary (i.e. the extent to which the boundary collapses), λ and k are the scale and shape parameters of the Weibull function, influencing the stage at which the boundary starts to collapse and the steepness of the collapse, respectively. The shape parameter k was fixed to 3, corresponding to a “late collapse” decision strategy, following other studies showing that it’s a typical strategy implemented by participants [41]. A judgment is made when the accumulated difference between the number of samples supporting one type of the factory over the other exceeded one of two symmetric boundaries, ±u t . The accumulated difference was computed as: Where d t is the difference between number of evidence points at time t, and ε t is a random noise sampled from a normal distribution with a mean of 0 and variance of σ2. X 1 denoted a bias in a starting point. Model parameters were fitted to each participant’s data for desirable and undesirable responses separately using maximum likelihood estimation method. For each trial, we simulated the models 1000 times for a given set of proposal parameters and calculated the proportion of trials in which the model RT matched the empirical data. Denoting this proportion by p i , we maximized the likelihood function L(D|θ) of the data (D) given a set of proposal parameters (θ), by: To find the best set of proposal parameters we first used an adaptive grid search algorithm and then used the five best sets of proposal parameters as starting points to a Simplex minimization routine [43]. In order to evaluate the quantitative fits of the models, we used Akaike Information Criterion.

Acknowledgments We thank members of the Affective Brain Lab for comments on previous versions of this manuscript. Amiti Shenhav and Brad Love for helpful discussion. Marius Usher and Moshe Glickman for providing us with analysis scripts for DDM with collapsing boundaries.