Participants

59 younger and 63 older adults took part in the study. Target sample size was pre-determined according to age effect sizes in previous studies of learning and decision-making. Six older and two younger adults were excluded because of insufficient data (<300 trials of predictive inference task completed). Thus, the effective sample consisted of 57 younger adults (mean age: 24.5 years, age range: 20–30 years, 29 female) and 57 older adults (mean age: 69.2 years, age range: 56–80 years, 26 female). Participants gave written informed consent. The Institutional Review Board of the Max Planck Institute for Human Development approved the study. In addition to the experimental task, participants completed a biographical and a personality questionnaire and several psychometric tests: (1) Identical pictures test; (2) Raven’s Progressive matrices66; (3), Spot-a-Word test; and (4) the Operations span task (OSPAN)67. As shown in Table 1 older adults had lower scores on the Identical pictures test, Raven’s matrices and the OSPAN task than younger adults (P values<0.001, η G 2s>0.21). In contrast, older adults obtained higher scores than younger adults on the Spot-a-Word test (P<0.001, η G 2=0.20). Consistent with previous findings from larger population-based samples, these results suggest age-related reductions in fluid intelligence and age-related improvements in crystallized intelligence68.

Table 1 Psychometric profiles of younger and older adults. Full size table

Procedure

Participants performed two sessions, which were separated by a minimum of 1 week and a maximum of 3 weeks. In the first session, participants completed a biographical questionnaire, the BIS/BAS personality questionnaire, Raven’s progressive matrices66 as well as a two-state Markov decision task69, data of which are presented in ref. 44. In the second session, participants performed the predictive inference (Helicopter) task15, the OSPAN task, the Spot-a-Word and the Identical pictures test as well as a version of the two-state Markov task.

Predictive inference task

Participants completed two blocks (200 trials each) of a computerized predictive inference task programmed in Matlab (The MathWorks, Natick, MA) using MGL (http://justingardner.net/mgl) and snowDots (http://code.google.com/p/snow-dots) extensions. The predictive inference task required inferring the mean of a noisy variable that underwent occasional change points15. The problem was embedded in a cover story involving a virtual helicopter (mean) that moved occasionally (change points) and dropped a bag from the sky on each trial (noisy variable).

On each trial the participant moved a bucket to the most likely position of the helicopter using a keyboard (Fig. 1a). After the bucket position was confirmed through a key press, the participant observed a bag fall from the top of the screen followed by an explosion that revealed the contents of the bag (200 gold coins or silver rocks; randomized across trials) and the extent to which those contents were collected in the bucket (ranging from 0–200 depending on the distance between the bucket and the bag). Gold tokens (but not rocks) collected in the bucket were translated into incentive payments at the end of the task. The horizontal position of the bag was denoted with a grey tick mark on the screen and the distance between the bag and bucket (prediction error) denoted by a red line. These markings served to eliminate working memory requirements and allowed subjects access to all relevant information in choosing how much to adjust the bucket position for the subsequent trial.

The horizontal position of each bag (represented on a numerical scale from 0 to 300 for convenience) was drawn from a normal distribution with a mean corresponding to the position of a virtual helicopter hovering in the sky and a s.d. that was manipulated blockwise (10 or 25; counterbalanced for order). On most trials the helicopter would remain stationary, but on a small fraction of trials (ground truth hazard rate; 1/10) it would relocate to a new screen position. On the vast majority of trials the helicopter was ‘hidden’ by clouds. Occasionally, the helicopter was revealed visually (catch trials; 1/10). In principle, the visible helicopter could provide perfect information about the mean of the distribution, but in practice the centre of the helicopter was not obvious due to asymmetry in the cartoon helicopter image and the vertical distance between this image and that of the bucket (Supplementary Fig. 1). Participants were instructed to infer the location of the helicopter based on previous observations (bag and helicopter positions) and to place the bucket directly underneath it.

Training

Before completing two blocks of the predictive inference task participants went through a series of training tasks that slowly built the complex task from simpler elements. As in the experimental session, every training task consisted of a low and high standard deviation (noise) block (counterbalanced for order). In the first training task the helicopter was completely visible and thus bag locations were not necessary to guide behaviour. To ensure that participants understood that the helicopter is the best outcome predictor we used a response criterion that required participants to put their bucket ten times exactly underneath the visible helicopter. Each noise block stopped after either the criterion was reached or after a maximum of 80 trials. In the second training task with two 50-trial runs clouds covered the helicopter and occasionally disappeared during catch trials. This version of the task was the same as the experimental task except that participants would not earn money for their collected coins. Overall performance, in terms of coins collected, did not differ across age groups (Supplementary Fig. 2).

Computational modelling

To dissociate surprise-driven updating from uncertainty-driven updating we extended an existing normative model for learning in a dynamic environment that has been described in detail previously15,16,18. In brief, this model approximates optimal inference by tracking two factors that should drive learning: change-point probability (the probability with which a change in the helicopter location occurs) and uncertainty (the reliability with which an outcome reflects the true location of the helicopter). Here we extend this model in four ways. First, we develop a new method for estimating change-point probability and uncertainty that captures subjective differences in experienced surprise. Second, we extend the generative framework and corresponding inference equations of the model to incorporate catch trials. Third, we extend the normative model to allow for specific deviations from normativity including surprise insensitivity, incorrect hazard rate assumptions, and uncertainty underestimation. Finally, we extend the model to consider more complex models of behaviour that allow for subjective differences in the representation of noise.

The first extension of the previously described computational methods allowed for subjective estimates change-point probability and uncertainty. Previous studies have run the normative model over trial outcomes to get trial-by-trial estimates of these quantities16; however, one issue with this approach is that since participant and model predictions do not always perfectly match, an outcome that constitutes a small and unsurprising error for the model might actually be a large and rather surprising one for the participant. To avoid this potential problem we obtained subjective measures of change-point probability and uncertainty by running the normative model across the prediction errors experienced by participants, rather than the outcomes that generated them. Model variables were computed recursively by first determining the uncertainty about the current helicopter location according to the relative uncertainty, change-point probability and prediction error from the previous trial:

Where is the variance on the predictive distribution over possible helicopter locations, is the variance on the distribution over bag locations (noise), Ω t is the probability of a change point on the previous trial (that is, the probability with which the helicopter has relocated between trials), τ t is the relative uncertainty from the previous trial and δ t is the prediction error from the previous trial. Relative uncertainty was computed by expressing uncertainty about the helicopter location as a fraction of total uncertainty about the location of the next bag:

Where τ t+1 is the relative uncertainty for trial t+1. This relative uncertainty estimate, along with the variance on the bag distribution (noise; ) was used to calibrate the change-point probability associated with each new prediction error:

Where H is the hazard of a change point (0.1) and δ t+1 is the new prediction error. Subjective estimates of change point probability and relative uncertainty were computed by evaluating equations 1 and 2 according to the trial-by-trial prediction errors made by each individual subject.

The second extension of the model was necessary to account for additional information provided on catch trials in which the helicopter is visible to participants. To maintain the deterministic nature of the model but also account for perceptual ambiguity associated with the helicopter image we treat the visible helicopter as a cue indicating a Gaussian likelihood function centred on the ground truth (mean of the bag distribution). We allow the variance of the Gaussian to be adjusted to account for behaviours ranging from completely trusting the helicopter information to completely ignoring it. Combining this additional cue with the information provided by the bag itself led to the following additional equations that were implemented at the end of each helicopter visible trial to update position estimates:

Where B t is the belief of the model about the true mean of the distribution and w t reflects the weight of the current belief in a weighted mixture of the current belief and the true mean (μ) as indicated by the helicopter. w t is determined according to the relative variances on the current predictive (σ μ ) and helicopter centred likelihood distributions (σ H ).

In addition, the following equations were implemented to reduce the relative uncertainty estimates on trials where the helicopter was observable:

Where is the variance on the predictive distribution over possible helicopter locations after correcting for additional information provided by the visible helicopter:

Where and are the variances associated with the internal prediction and the perceptual information provided by the visible helicopter, respectively.

The third extension of the normative model served to allow for specific deviations from optimal behaviour. We simulated behaviour from four versions of the normative model: (1) a version using the update equations described previously15,16 with the modifications described above, (2) a model with diminished sensitivity to surprise created by raising the change point likelihood to a power between 0 and 1 (0.2 for figures) as described previously18, (3) a low hazard rate model expecting change points to be rare (H was set to 0.001) and (4) an uncertainty underestimation model in which uncertainty was reduced after each observed bag drop by dividing the estimated variance on the predictive distribution over possible helicopter locations ( ) by a constant on each trial (10 for simulations).

Flexible versions of the normative model were fit directly to behaviour and used to infer maximum likelihood estimates of (1) hazard rate, (2) surprise sensitivity and (3) uncertainty underestimation, which were then use to identify age-related differences in these computational factors. For the purposes of model fitting, participant updates were defined to be sampled from a normal distribution with a mean equal to the model predicted update and a s.d. that was a linear function of the absolute prediction error magnitude. The intercept and slope of this linear function were fit as free parameters and can be thought of as variability in the motor update and learning rate selection processes respectively. Thus, the minimally complex model contained five free parameters, three of which were related to learning and two of which were related to response variability. This model fit better than several more constrained ones in which parameters were fixed to their normative values (Supplementary Fig. 4).

In addition, more complex models were constructed that considered potential sources of variability related to the perception of noise. These complex models included all of the basic variables as well as one or more of the following free parameters: (1) a multiplicative scaling term to allow for scaled perceptions of noise, (2) an additive offset term allowing for subjective biases in overall levels of noise perception and (3) a noise variability term allowing for individual subjects to represent a distribution across possible noise values. Since there were only two noise conditions, including additive and multiplicative scale factors amounted to allowing the noise for each block type to be fit as a free parameter. Within the model that accounted for noise variability, the likelihood of observations was not drawn from a single normal distribution (as described in equation 3), but instead from a weighted mixture of normal distributions, where each component of the mixture had a mean of zero and a s.d. equal to a scaled version of the total uncertainty. Scale values were represented as uniformly spaced points on a grid (ranged 0.1–100) with associated probabilities drawn from an inverse gamma distribution. The shape term of the gamma distribution was fit as a free parameter and can be thought of as conveying the amount of evidence for the expected noise distribution, with lower values indicating more uncertainty over possible noise values.

All models were fit using a constrained search algorithm (fmincon in Matlab) that maximized the total log posterior probability of participant updates given participant prediction errors and parameter estimates. Weak priors favouring normative learning parameters were used to regularize parameter estimates. Uncertainty underestimation estimates were positively skewed and thus reported and analysed in log units. All model-fitting code will be made available on request.

Data analysis

Participant bucket placements and trial outcomes were used to compute trial-by-trial prediction errors (δ):

where χ t and B t are the locations of the dropped bag and placed bucket on trial t, respectively. The corresponding updates made by the participant in response to each prediction error were computed as:

The first and last trials of each block were omitted from further analysis, as updates on these trials were likely to be influenced by block changes. Trials where the prediction error equalled zero were also omitted, as they provide no information about error-driven learning. In addition trials where bucket placement fell more than 15 screen units away from any possible delta rule update towards the previous bag or helicopter position were omitted, as they were considered to be governed by a process other than error-driven learning. 1.1% of trials were removed in this way.

Trial-by-trial updates were analysed with a regression model that included trial-by-trial prediction errors to account for overall learning rate, as well as the interaction of prediction error with five mean-centred factors: (1) surprise (change-point probability as computed above), (2) uncertainty (relative uncertainty as computed above), (3) noise (s.d. of bag distribution), (4) trial value (gold versus rocks) and (5) helicopter visibility (binary variable describing whether helicopter cue was provided). To allow for updates towards the visible helicopter on catch trials, the model also included the interaction between the true mean of the distribution and the helicopter visibility variable. An additional nuisance term was also included to account for a slight bias in bucket placements towards the centre of the screen. One potential shortcoming of this regression model is that the residuals are heteroscedastic; specifically, absolute residuals are larger on trials where participants made larger absolute prediction errors. To account for this, we used an initialization regression for each participant and pooled the residuals to compute the variance over residuals across sliding windows of absolute prediction error magnitude. These variance estimates were used to weight the errors in a weighted regression equation that also included a ridge penalty to regularize coefficient estimates:

where A is the explanatory matrix, P is the inverse variance matrix, and R is a regularization matrix constructed with the ridge parameter equal to 0.1.

To identify specific learning differences predicted from the normative model (Fig. 2), we applied the penalized weighted regression model to data that were binned in sliding windows according to the size of the absolute prediction error made by the participant divided by standard deviation of the bag distribution, which served as a proxy for surprise (Fig. 3a). Each bin contained 10% of the total data and successive bins had lower and upper bounds that were incremented by a single percentile resulting in 90 total bins.

Regression coefficients were smoothed across bins and t-tests were used to identify ‘clusters’ of contiguous bins for which the p-value was smaller than 0.05. This procedure was repeated for three separate tests to reject the null hypotheses that (1) coefficients from older participants are different than zero, (2) coefficients from younger participants are different from zero and (3) coefficients from younger participants are different from those from the older participants. For each cluster, we computed cluster mass as the size of the cluster (number of bins) times the average absolute t-statistic within that cluster. For each test statistic a null distribution over cluster mass was generated by creating 10,000 permutations of the data (using sign-flipping for single group tests and label-flipping for the group comparison). Cluster corrected permutation tests were conducted by comparing the observed cluster mass for each test statistic against the null distribution created through these permutations. See Supplementary Fig. 3 for estimates of parameters that did not differ across the age groups.

Single participant coefficients were extracted for each coefficient that was significantly different across age groups according to a leave-one-subject-out (LOSO) procedure: Coefficients for each participant were extracted from the error bin that corresponded to the maximum absolute t-statistic from a between groups t-test across all bins for all other participants. These LOSO coefficient estimates were included as explanatory variables in a regression on participant age. Specifically, we created four distinct explanatory models containing: (1) only an intercept term, (2) LOSO coefficients and an intercept, (3) LOSO coefficients, Raven’s scores, OSPAN scores and an intercept. Nested F-tests were used to compare the fits of these different models while accounting for differences in complexity.