Abstract Delayed comparison tasks are widely used in the study of working memory and perception in psychology and neuroscience. It has long been known, however, that decisions in these tasks are biased. When the two stimuli in a delayed comparison trial are small in magnitude, subjects tend to report that the first stimulus is larger than the second stimulus. In contrast, subjects tend to report that the second stimulus is larger than the first when the stimuli are relatively large. Here we study the computational principles underlying this bias, also known as the contraction bias. We propose that the contraction bias results from a Bayesian computation in which a noisy representation of a magnitude is combined with a-priori information about the distribution of magnitudes to optimize performance. We test our hypothesis on choice behavior in a visual delayed comparison experiment by studying the effect of (i) changing the prior distribution and (ii) changing the uncertainty in the memorized stimulus. We show that choice behavior in both manipulations is consistent with the performance of an observer who uses a Bayesian inference in order to improve performance. Moreover, our results suggest that the contraction bias arises during memory retrieval/decision making and not during memory encoding. These results support the notion that the contraction bias illusion can be understood as resulting from optimality considerations.

Citation: Ashourian P, Loewenstein Y (2011) Bayesian Inference Underlies the Contraction Bias in Delayed Comparison Tasks. PLoS ONE 6(5): e19551. https://doi.org/10.1371/journal.pone.0019551 Editor: Adrian G. Dyer, Monash University, Australia Received: December 9, 2010; Accepted: April 5, 2011; Published: May 12, 2011 Copyright: © 2011 Ashourian, Loewenstein. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This research was supported by grants from the National Institute for Psychobiology in Israel - founded by The Charles E. Smith Family - and from the Israel Science Foundation (grant No. 868/08). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist.

Introduction Comparing magnitudes of two temporally separated stimuli is one of the fundamental tools of experimental psychology and neuroscience. Interestingly, choice behavior in these experiments reveals a fundamental bias: when the first stimulus is small, subjects tend to overestimate it, whereas when it is large, they tend to underestimate it. The first account of this bias, known as the contraction bias, was published a century ago by Harry Levi Hollingworth who later became one of the pioneers of applied psychology. Hollingsworth presented subjects with square cards of various sizes for a brief period of time and asked them to memorize their sizes [1]. Each card presentation was followed by a short delay, after which the subjects selected a matching card from a set of probe cards. Surprisingly, Hollingsworth observed that subjects tended to choose a probe card that was too large when the memorized card was small compared to the other cards used in the experiment, whereas the opposite behavior, i.e. picking too small a probe card, was observed when the memorized card was relatively large. This illusion has been demonstrated numerous times since Hollingworth's publication for a variety of analog magnitudes in the visual, auditory, and somatosensory modalities [1]–[7], [ for review see 8]. The customary explanation for the contraction bias is that the perceived magnitude of a stimulus is a weighted combination of its veridical magnitude and a reference magnitude, such as an average of all contextually relevant stimuli, that serves as an anchor [3], [9], [ but see 10]. Thus in Hollingsworth's experiments and others [1]–[5] the anchor is thought to make a larger contribution to the subjective magnitude of the memorized stimulus than to the subjective magnitude of the probe stimulus. As a result, the memorized stimulus is biased towards the anchor more than the probe stimulus, which results in the overestimation of small memorized stimuli and the underestimation of large memorized stimuli. This explanation, however, is at best partial since there is no consensus on the choice of the contextually relevant stimuli that comprise the anchor, or on the relative weights of the physical and reference magnitudes. Moreover, it is not clear why the weight applied to the memorized stimulus should be different from the weight applied to the probe stimulus. Finally, the computational principles underlying this bias remain unknown. In order to address these questions we explored whether the contraction bias can be understood as resulting from optimality considerations. There is a growing body of literature suggesting that the brain utilizes Bayes' rule to optimally combine information from different sources [11]–[18]. In particular, the application of Bayes' rule has been demonstrated in slant perception [19], sensorimotor learning [20], speed estimation [18], time estimation and interval timing [21], motion perception [22], and integration of information from different sensory modalities [12], [23]. In addition, it has been suggested that Bayesian inference underlies the effect of categories on behavior in reconstruction tasks [24]. Therefore, we hypothesized that the contraction bias in delayed comparison tasks results from a Bayesian inference in which noisy representations of stimuli are combined with knowledge about the a-priori distribution of magnitudes in order to optimize performance. Intuitively, such an inference should lead to the contraction bias because the perception of extreme magnitudes of the first stimulus, which are unlikely given unimodal prior distributions, will be biased toward the ‘center’ of the prior distribution. In order to test this hypothesis, we conducted an experiment in which we instructed subjects to memorize the length of a bar presented on a computer screen and then compare this memorized length to the length of a probe bar. We show that contraction bias depends on the prior distribution of bar lengths, and increasing the uncertainty in the memory of bar lengths enhances the contraction bias, both of which are consistent with the Bayesian hypothesis. When within a trial does the Bayesian computation take place? Is the encoded memory biased or does the prior information bias the result of the length comparison? By manipulating uncertainty in the memory of bar lengths after memory encoding and measuring the magnitude of the contraction bias we demonstrate that prior information is introduced during memory retrieval/decision making rather than when the first stimulus is encoded in memory. Some of the findings presented here have appeared previously in abstract form [25].

Discussion We examined the hypothesis that the contraction bias in delayed comparison tasks results from a Bayesian inference in which information about the prior distribution is combined with noisy measurement in order to optimize performance. This hypothesis makes two predictions: a translational shift in the prior distribution is expected to result in a similar translational shift in the bias curve, and increasing noise in memory is expected to increase reliance on prior knowledge and thus increase the bias. Our results are consistent with both predictions, suggesting that the contraction bias results from a Bayesian inference. Within a single trial, when does information about the prior distribution combine with the sensory measurement? One possibility is that it takes place during the encoding of L 1 . In this case, the encoded memory of L 1 is already biased in the direction of the prior distribution. Another possibility is that the memory of L 1 is unbiased and the Bayesian computation takes place at the comparison stage, when the encoded L 1 is compared with L 2 . To address this question we again considered the choice behavior of subjects in the experiment with the interfering task. We found that in this experiment, the slope of the response curve was more negative in trials with interference from the secondary task, compared to the standard trials (Figure 4C). In other words, more weight was given to the prior distribution in trials interrupted by the secondary task. Recall that trials containing this task were randomly intermixed with trials that did not contain interference. Therefore, at the time of encoding of L 1 (up to 0.5 sec after the end of the presentation of L 1 ) the subjects could not know whether they would be presented with an interfering task and therefore could not know what weight to give to the prior distribution. Therefore, if the computation had taken place at the time of the encoding of L 1 , we would have observed no difference in the slope of the response curve between the two conditions. Therefore, the Bayesian computation necessarily took place after the interfering task, at the time of L 1 retrieval or later, when L 1 and L 2 were compared. How do subjects learn the prior distribution? In order to address this question, we compared the level of contraction bias, as measured by the slope of the response curve, in the first 20 impossible trials to the slope in the last 20 impossible trials for subjects who completed the experiment in Figure 1A where the bar lengths were drawn from the 150–600 and 50–200 ranges. We found no statistical difference in these slopes (−0.29 for the first 20 trials; −0.28 for the last 20 trials; average difference = −0.01; 95% bootstrap CI for the difference in slopes, [−0.29 0.27]). These results indicate that the contraction bias emerges within a small number of trials, suggesting that the prior distribution of bar lengths in the experiment is estimated using a small number of trials. In this study we examined the effect of a translational shift in the prior, but we did not alter the shape of the prior distribution. Previous studies have shown that subjects are sensitive to the shape of the prior distribution in category and sensimotor learning [20], [24]. Consistent with these results, changing the shape of the prior distribution in our model changes the shape of the response curve. The extent to which the shape of the prior distribution can be learned and utilized in Bayesian reasoning, however, awaits future studies. Contraction bias in delayed comparison tasks is a common cognitive illusion observed in many different modalities and under different experimental conditions [1]–[8]. In this paper we provide a normative interpretation of this bias, supported by an experiment in visual domain. Our results are consistent with a growing body of literature showing that the brain utilizes close-to-optimal computational strategies.

Materials and Methods Ethics Statement All subjects gave written informed consent using methods approved by the Massachusetts Institute of Technology Committee on the Use of Humans as Experimental Subjects. Subjects Subjects were undergraduate and graduate students from the Massachusetts Institute of Technology. All subjects had normal or corrected-to-normal vision and no subjects took part in more than one of the experiments. Each subject received $10 plus 1 cent for every correct trial in the experiment for a session lasting less than an hour. Stimuli Stimuli were white horizontal bars on a black background displayed on a 17″ computer screen with a resolution of 1024×768. All bars were 3 pixels wide. Procedure Subjects sat approximately 60 cm from a computer screen in a dimly lit room. Each subject completed 400 to 600 trials in one hour and received feedback on their overall performance after every 20 trials. No other feedback was provided. In the standard task, each trial started with the presentation of a L 1 at a random location on the screen for 1 sec. After a delay period of 1 sec, during which screen remained blank, L 2 appeared at another random location on the screen. L 2 remained visible until the subjects pressed one of two keys indicating which bar was longer. The difference in length between L 1 and L 2 varied between −30% and +30%. Unbeknownst to the subjects, in roughly 50% of the trials, the lengths of the first and second bars were equal (L 1 = L 2 ). Subjects did not receive feedback on performance on individual trials. Each trial was followed by a 2 sec intertrial interval during which the screen remained blank. Two distinct groups of subjects completed the standard task. One group (n = 9) saw L 1 bars chosen uniformly in the logarithmic scale from the [50, 200] pixel interval, while the other group (n = 10) saw bars chosen from the [150, 600] pixel interval. The modified task was identical to the standard task with two exceptions: (1) L 1 bars were chosen uniformly in the logarithmic scale from the [100, 400] pixel interval; (2) subjects completed a distracting memory task between the presentation of L 1 and L 2 in a randomly selected 50% of the trials: 500 msec after L 1 disappeared, a random sequence of four colors (red, blue, white, and green) were displayed on the screen for 400 msec each. 400 msec after the disappearance of the last color, a number from 1 to 4 appeared in yellow on the screen. Subjects were instructed to recall the color that corresponded to the number and press one of four dedicated keys to indicate this color. L 2 appeared 500 msec after subjects made their color choice. A Bayesian Model of Contraction Bias According to our Bayesian hypothesis, the contraction bias emerges because subjects use Bayes' law to combine noisy information about the lengths of the bars with knowledge about the prior information in order to optimize performance. In this section we formalize this intuition. In accordance with Weber's law, the lengths of the bars are measured in logarithmic scale. Let L i and R i be the logarithm of the length of bar i and its neural representation, respectively. We assume that this representation is noisy such that where z i is drawn from a zero-mean Gaussian distribution with variance , . This is illustrated in Figure 5A where we plot the probability of a neural representation R i for a given representation of bar length , also known as a likelihood function and denoted as . We assume that the prior distribution of bar lengths, , is uniform (Figure 5B). Bayes' rule provides a method for combining information about the prior distribution with the noisy neural representation, in order to compute the posterior distribution, (Figure 5C). According to Bayes' rule (1)where . PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 5. Ideal decision maker solution to the task in Ideal decision maker solution to the task in Figure 1A A, The likelihood of a representation R i given a particular length (here L i = 0.85, σ i = 0.24) assuming . B, The prior distribution of bar lengths. C, The posterior distribution of L i given a particular measurement (here R i = 0.85), calculated using Bayes' rule. D, The probability that L 1 >L 2 for different values of R 1 and R 2 , computed using the posteriors. The black line corresponds to the values of R 1 and R 2 such that Pr(L 1 >L 2 |R 1 ,R 2 ) = 0.5 (here, and ). E, Response curve of the model on the impossible trials in which L 1 = L 2 . https://doi.org/10.1371/journal.pone.0019551.g005 Given a pair of neural representations, , of the lengths of the first and second bars, the probability that the first bar is longer than the second bar is given by (2)This is illustrated in Figure 5D where we use a color scale to plot for different values of R 1 and R 2 . The black line corresponds to values of (R 1 , R 2 ) such that . Note that the slope of this curve is smaller than 1. This results from the assumption that , reflecting the fact that L 1 has to be stored in memory, a process that may contribute additional noise to the representation of L 1 . An ideal Bayesian observer, who has access to R 1 and R 2 , would report ‘L 1 >L 2 ’ in trials in which and ‘L 1 <L 2 ’ in trials in which . Therefore, the probability that a model would report ‘L 1 >L 2 ’ in a trial in which L 1 and L 2 are presented is given by (3)where . In order to construct the response curve we compute (Figure 5E). For further insights into the Bayesian computation, we consider the simple example in which the level of uncertainty in the representation of L 1 is infinite, whereas there is no uncertainty in the representation of L 2 . In other words, and . In this case, Eq. (1) becomes and , Eq. (2) becomes and therefore the subject would report would report ‘L 1 >L 2 ’ if R 2 is larger than the median of L 1 . In trials in which L 1 = L 2 , Eq. (3) dictates that he would report ‘L 1 >L 2 ’ in trials in which L 2 is larger than the median and ‘L 1 <L 2 ’ otherwise. Data analysis Slope of line fitted to response curve. All slopes were computed after normalizing the range of lengths to 0 and 1 in the logarithmic space. Bootstrap confidence intervals. We used a pairs bootstrap resampling procedure [27] in order to calculate confidence intervals for the slope of the regression lines. The bootstrap algorithm is as follows: repeated 5,000 times, we sampled (with replacement) from each subject's impossible trials in order to obtain a bootstrap dataset and fitted a regression line to the averaged response curve of each bootstrap dataset. This procedure resulted in 5,000 bootstrap slopes that could be used for calculating a CI for the slope of the regression line fitted to the experimentally obtained data points. The CIs reported in the text are 95% basic bootstrap intervals [27]. In order to compare the response curve slopes between subjects who saw 50–200 pixel lines and those who saw 150–600 pixel lines we sampled from each group independently using the algorithm above, and then constructed a 95% confidence interval on the difference between the bootstrap slopes of the two groups. In order to compare trials with and without the interference task we calculated the difference in the bootstrap slope of each subjects' standard and interfered trials, and found the 95% confidence interval of this difference. The same method was also used to compare the slope of the response curve in the first 20 impossible trials of the experiment to the slope of the response curve in the last 20 impossible trials of the experiment. Bayesian model fit. In order to compare behavioral performance to that predicted by the model, we used the model presented above to generate a set of response curves of ideal observers characterized by different values of and . These curves were compared to the experimentally measured response curves as described below: Note that subjects exhibited a small bias in favor of reporting ‘L 2 >L 1 ’ in the 50–200 and 150–600 standard experiments. Subjects reported that ‘L 1 >L 2 ’ in the impossible trials in 41% and 46% respectively. This tendency has been reported previously [28], [29]. In principle, such a bias can be explained in our Bayesian framework by claiming that the prior distribution that the subjects use in their Bayesian computation is biased in favor of small magnitudes, as was observed for speed perception [18]. In this framework, it is predicted that in the modified experiment (Figure 4A), the global bias should be larger in the trials interfered by the color task than in the standard trials. In fact we found that the global bias was larger in the modified trials (42% vs. 38%). However, this effect was not statistically significant (p = 0.58, two tailed t-test). More importantly, this explanation is circular because a bias in the opposite direction could equally well have been explained by arguing that the prior distribution is biased in favor of large magnitudes. Therefore we did not attempt to account for the global bias and subtracted it before fitting, assuming that it is generated by a different mechanism. Thus, for the purpose of finding the parameters we added a constant to each of the response curves to normalize them such that mean(Pr[‘L 1 >L 2 ’]) = 0.5. For purposes of comparison, the range of the logarithm of bar lengths was normalized to lie between 0 and 1 and we used a least square fit to find the parameters that best fit the population-average experimental data. We found that the best fit model parameters for the groups who saw 50–200 and 150–600 pixel-long bars were given by , ; The best fits for trials not interfered by the distracting task and those that had the distracting task were , , and , , respectively.

Acknowledgments We thank Sebastian Seung for his encouragement and support and Merav Ahissar, Konrad Körding, Ofri Raviv, and Hanan Shteingart for fruitful discussions.

Author Contributions Conceived and designed the experiments: YL PA. Performed the experiments: YL PA. Analyzed the data: YL PA. Contributed reagents/materials/analysis tools: YL PA. Wrote the paper: YL PA.