Participants

All experiments were approved by the University College London Ethics Review Board. Participants (n=45, 25 females, aged 19–35 years) were recruited via the UCL Institute of Cognitive Neuroscience recruitment mailing list, and gave their written informed consent before beginning any experiments. All participants were healthy, with no history of neurological or psychiatric disorders, and no family history of epilepsy. Sample size was based on recent experiments involving stress33.

Task

Participants initially underwent a shock thresholding procedure (see below). Participants then received thorough instruction (see below) that made explicit the structure of the task, and completed a practice session of 30 trials of the probabilistic learning task used in the main experiment. They were also familiarized with the use of the subjective stress rating scale.

Our learning task was closely modelled on that used in a recent study leveraging the same computational framework in a non-stressful context15. Timings on each trial were jittered using a uniform distribution to allow us to maximally divorce physiological responses to different events.

Each participant completed a set of 320 trials. On each trial, participants were presented with one of two stimuli (in our case, rocks). These stimuli remained on screen for 300 ms (±50 ms) before participants were asked to make a prediction, signalled with a button press, as to which outcome (snake or no snake) was likely to follow (Fig. 1a,b). This decision was made under time pressure, with a timeout period averaging 1,000 ms (±200 ms).

Once the decision had been made, the prediction was displayed for an average of 1,200 ms (±200 ms), before the outcome was revealed. Outcomes remained on screen for 1,000 ms (±200 ms). In the case of the snake stimulus, outcome presentation was coincident with the delivery of a shock. This was followed by an intertrial interval of 2,000 ms (±500 ms), during which a fixation cross was displayed.

The probabilistic mapping from stimulus to outcome shifted over the course of the experiment (Fig. 1c), requiring participants to constantly track the relationship over time. This resulted in fluctuations in the level of various forms of uncertainty. Each session of 320 trials was divided into 10 blocks of different stimulus–outcome probabilities, of lengths that varied between 26 and 38 trials. The transitions between these blocks were not made explicit to the subject. The probabilities governing each block varied from heavily biased (90/10), through moderately biased (70/30) to unbiased (50/50), allowing us to examine the effect of predictability on stress responses. We used four repeats each of the biased probability blocks (2 for each bias direction, that is, 70/30 and 30/70) and two repeats of the 50/50 to generate 10 blocks in total.

Participants were paid a base rate of £10 and informed that they would receive an extra £0.05 for each correct prediction they made. Outcomes (correct/incorrect) were not explicitly signalled. Participants were allowed to take a self-timed break every 10 min.

For 41 of the 45 subjects, we report questionnaire measures of life stress (Perceived Stress Scale, PSS). Data from the remaining participants were lost due to a technical error. Some subjects (n=23) completed questionnaires on a separate day, while the remainder (n=22) did so after the main task. We also collected a questionnaire measure of intolerance of uncertainty (n=43; Supplementary Fig. 6), depression (Beck Depression Index), and a questionnaire related to anxiety (Mood & Anxiety Symptom Questionnaire). We do not report data from the latter two questionnaires here.

Participant instruction

Participants were given detailed computerized instruction on the structure of the task. This emphasized that the accuracy of their predictions did not affect the number of shocks they received but did influence their earnings on the task. Understanding was confirmed with the question: ‘How much do you earn per correct prediction?’

We also attempted to ensure good understanding of the probabilistic relationships governing stimulus:outcome relationships. We emphasized that the probabilities were reciprocal (p(snake|rock A)=1−p(snake|rock B)), and checked for comprehension with the question: ‘If the probability of a snake being under rock A is 40%, what is the probability of it being under rock B?’

Participants were further informed that the probabilities changed throughout the task, and that at times might appear to be random, that is, the probability of an outcome following each stimulus might be equal (50/50).

Shock thresholding procedure

Electric shocks were controlled using a Digitimer DS5 system in conjunction with a National Instruments Data Acquisition Board, which allowed control of shock amplitude via Matlab (Mathworks). Electrodes were placed 0.5 cm apart on the first dorsal interosseous of the left hand. Electrode sites were cleaned with alcohol and a mild abrasive (NuPrep Skin Prep Gel). Shocks were delivered using BioPac Ag/AgCl electrodes filled with Sigma Spectra 360 Electrode Gel, attached using double-sided adhesive pads.

Participants first underwent a thresholding procedure that allowed us to map their subjective sensitivity to shock. Thresholding consisted of a sequence of 80 shocks, with currents of magnitudes between 0.1 and 10 mA chosen according to a staircasing procedure. After each shock, participants were asked to report how painful it was, from a rating of 1 (not painful) to 5 (very painful). We used an automated thresholding procedure inspired by Gracely et al.62, in which separate staircases are used to estimate the transition points between each rating (1/2,2/3,3/4,4/5). For robustness, we used two independent staircases, running the QUEST thresholding algorithm63 for each transition. This gave a total of eight independent staircases, with trials from each staircase randomly interleaved. At the end of thresholding, we averaged the two estimates for the 4/5 boundary to set the shock intensity for the rest of the experiment. Participants were given a sample shock at this intensity and in three cases we reduced the amplitude of the experimental shock by 20% at the participant’s request. Shock sensitivity was also measured at the end of the task, and on average showed a slight decrease (single-sample t-test, t 44 =2.70, P=0.097, d=0.403), equivalent to an 11% reduction in subjective pain.

Stress measures

Each participant was asked to make 65 subjective stress ratings at semiregular points throughout the probabilistic learning task (every 4–6 trials). These required participants to move a marker along a line to answer the question ‘How stressed do you feel at this moment?’ (Fig. 1d). All analyses were conducted on z-scored ratings to obviate between-subject differences in use of the scale.

Pupil diameter was recorded in a subset of participants (n=22) using an EyeLink 1000 System (SR Research), sampled at 100 Hz. Participants were seated in a darkened room, and asked to maintain fixation wherever possible. Stimuli were luminance matched, with the no snake outcome signalled by a scrambled version of the snake picture.

Skin conductance was recorded from the index and middle fingertips of the left hand using 8 mm BioPac AgCl electrodes. Electrodes were filled with a 0.5%-NaCl paste (BioPac Gel 101) and attached using double-sided adhesive pads supplemented by tape. We utilized a custom recording system based on the provision of a constant current between the two electrodes and the measurement of the resultant voltage, allowing calculation of the conductance of the skin (AT64, Autogenic Systems). This signal was converted to an optical pulse and then digitally recorded at 100 Hz in Spike2 (v6.17).

In a subset of subjects (20), we collected saliva samples at 8 time points, from which we measured cortisol concentrations. To avoid baseline elevation due to anticipatory stress we collected two baseline readings (samples 1 and 2) on a separate day, on which participants were aware that no shocks would be received. On the day of the experiment, we collected the following samples: (3) on arrival; (4) immediately before task (∼15 min after shock thresholding); (5) 10 min into task; (6) 20 min into task; (7) 30 min into task; and (8) post task. Participants salivated through straws into 2-ml polypropylene tubes. Samples were frozen on the day of collection. Analysis was performed by Viapath at King’s College Hospital, using a competitive immunoassay. Briefly, cortisol in the sample competes with cortisol conjugated to horseradish peroxidase for binding sites on a microtitre plate. Unbound reagents are then washed away. Bound cortisol enzyme conjugate is measured by the reaction of the horseradish peroxidase enzyme to the substrate tetramethylbenzidine, producing a blue colour. A yellow colour is formed after stopping the reaction with an acidic solution. The concentration of cortisol in the sample is calculated as a function of the optical absorption at 450 nm; more absorption implies greater concentration of cortisol enzyme conjugate, and therefore lower concentration of cortisol in the sample. For further details, see Arakawa et al.64. Some samples were not suitable for analysis due to damage in storage (28/160). To accommodate for these missing values, we used a Skilling–Mack test to assess changes in cortisol over time. One subject was excluded due to baseline concentrations >3 s.d.’s away from the mean (60.59 nmol l−1; population mean=6.92 nmol l−1; population s.d.=13.57 nmol l−1). To verify that this did not affect our conclusions, we repeated our analyses without excluding this subject, and found comparable results. All data were log transformed before analysis to render data close to normal33.

Analysis of pupil diameter

Data were exported using the EDF2ASC plugin, and imported as ASCII files into Matlab. Pupil diameter measurements were downsampled to 100 Hz, and low-pass filtered (4 Hz, third-order Butterworth)41. Blinks were automatically detected by the EyeLink software, and removed by linear interpolation of samples 140 ms either side of the blink. Data were then z-scored and detrended. No further artefact removal was necessary.

Linear modelling involved a mixture of delta and boxcar regressors convolved with a canonical pupillary response function (see below for details of response functions)41. For phasic responses, we also convolved regressors with the first and second derivatives of the canonical response function, a standard method in magnetic resonance imaging65 designed to accommodate inaccuracies in the modelling of the amplitude and timing of induced responses. These were subsequently orthogonalised to their respective regressors, to apportion shared variance to the primary regressor. We took an additional step to remove signal due to changes in luminance, which produces pupil constrictions discernible from emotional/cognitive pupillary changes by their short latency31. We estimated a luminance response function for each subject on the basis of passive viewing of the images in our task (each image presented 50 times, displayed for 1,000 ms, with a jittered intertrial interval of between 9,000 and 1,100 ms). This provided a response function that could then be convolved with each presentation of an image, allowing us to discriminate fast, luminance-dependent constrictions from slower dilatations relating to cognitive variables (Supplementary Fig. 3). Following convolution of predictors with their response functions, predictors were z-scored. Details of the full linear model can be found in Supplementary Table 1.

Analysis of skin conductance

Data were first visually inspected and eight participants were rejected due to low recording quality.

Data from the remaining 37 participants were imported and preprocessed using tools from the SCRalyze suite42. Data were downsampled to 10 Hz and low-pass filtered (5 Hz, first-order Butterworth). We used a custom artefact rejection regime based on the second differential of the signal; non-physiological signals were identified by their very rapid rate of change. Subjects took self-paced breaks every 10 min, which caused substantial changes in skin conductance amplitude due to movement of the arm, and so we discarded the first five trials following a break. The time series was then concatenated, detrended and z-scored.

For linear models, we used a mixture of delta and boxcar regressors to represent phasic and sustained influences on skin conductance. These regressors were then convolved with the canonical skin conductance response function outlined in ref. 35 and provided in SCRalyze (http://scralyze.sourceforge.net/) (see below for details of response functions). We utilized a variety of nuisance regressors to isolate changes in skin conductance relating to our cognitive variables of interest. All predictors were z-scored before modelling. Details of the full linear model can be found in Supplementary Table 2.

Modelling of learning

We modelled learning in our task using several models. Three of these (Rescorla–Wagner, Sutton K1, Hierarchical Gaussian Filter) were implemented using the HGF toolbox (http://www.translationalneuromodeling.org/hgf-toolbox-v3-0/). For a full list of priors, see Supplementary Table 3. For details of each model, see Supplementary Table 4.

The model that best explained our data was the HGF (Fig. 2a). Introduced by Mathys et al.21, the HGF is a Bayesian learning model not constrained by the optimality typically assumed by such models; instead, subject-specific fitting allows for interindividual variability in learning. A recent functional magnetic resonance imaging study using the HGF highlighted its utility in assessing learning under uncertainty and its neural correlates15; our task and analysis were inspired by the ones used there. For a full description of the structure of the HGF, the reader is referred to the start of the Results section.

The HGF was fit to each individuals’ choices using Variational Bayes, with two free parameters: ϑ, a metavolatility parameter that determines step size at the third level of the HGF; and, ω, which is a constant component of the learning rate at the second level.

The four quantities utilized in our analyses of stress measures are all trajectories over time, with a value that evolves according to the predictions made and outcomes experienced by that subject.

The first of these is surprise (|δ 1 | in the HGF). This is the difference between the observed outcome (1=snake /0=no snake) on trial k and the subject’s belief about the probability of that outcome:

where μ 1 (k) is the actual outcome (1 or 0) and s( ) is the sigmoid transformation of belief about probabilities before seeing the outcome, that is, the subject’s expectations. By taking the absolute value of δ 1 (k), we therefore obtain a quantity that represents surprise about outcomes.

The three forms of uncertainty we consider are as follows:

: uncertainty of predictions at the first level on trial k. Because beliefs at the first level take the form of a Bernoulli distribution, the variance is a function of the mean , namely, × ( ). This means that uncertainty has an inverted-U relationship with belief, as depicted in Figs 3c and 4b. Intuitively, this form of uncertainty represents an individual’s estimate of the entropy of the environment at that moment in time; that is, how surprising they expect things to be. We refer to it as irreducible uncertainty.

: this is a form of informational uncertainty on trial k, representing lack of knowledge about the current stimulus:outcome relationship. Over time and in a stable environment, this uncertainty would fall to zero as the probabilities underlying the task are learned. In volatile environments, however, this is not the case. In the HGF, this form of uncertainty is approximately equivalent to a time-varying learning rate, used to update beliefs quickly when they are uncertain and slowly when they are supported by plentiful evidence. We refer to it as estimation uncertainty.

: this can also be considered a form of estimation uncertainty, this time over the volatility of the environment at trial k. Again, it controls the speed of learning about volatility, weighting prediction errors from the probability space at the second level. We refer to it as volatility uncertainty.

We ran an additional pair of Rescorla–Wagner learning models to test the validity of two assumptions made by the models in the original comparison. The first is that we assume participants update probabilities symmetrically on each trial: if p(outcome|stimulus 1) increases then p(outcome|stimulus 2) decreases by the same amount, as constrained by our task and explained to participants. To accommodate for departures from this scheme, we used a Rescorla–Wagner model in which the probabilities for each stimulus were updated independently. Second, we examined the possibility that beliefs were updated differently following trials on which shocks were delivered by fitting two learning rates (α shock and α noshock ) for each subject. We then compared these two models to the original Rescorla–Wagner model (in which probabilities were updated symmetrically with a single learning rate). Bayesian Model Comparison showed that the simple model comprehensively outperformed the two variants (exceedance probability=1). We concluded, therefore, that the assumptions of symmetric probability updating and balanced learning across shock/no shock trials were justified.

Modelling of stress

We used multiple regression models to examine the relationship between task variables, including the trajectories from the HGF outlined above, and stress responses. We used least-squared error to fit data from subjective ratings, using Matlab function glmfit. For physiological data, we used robust fitting to avoid spurious fits due to unidentified artefacts. These were implemented in Matlab with the function robustfit. In both cases, we used two-tailed t-tests to assess whether parameters were different from zero, that is, whether, at the population level, a given parameter meaningfully and consistently contributed to the dependent variable in question.

We compared several multiple regression models to examine the effect of uncertainty on subjective stress. All models included parameters for the previous rating and the number of shocks received since the last rating. The third term varied between models, and captured the effects of absolute prediction error (model 1), and uncertainty at each level (models 2–4).

Model 1: surprise. Model 1 omitted an explicit representation of uncertainty, but summed the absolute prediction errors (|δ 1 |, bounded 0–1 on each trial) since the last rating, reflecting surprise in response to outcomes given an individual’s beliefs.

Model 2: irreducible uncertainty. Irreducible uncertainty is the variance of the Bernoulli distribution representing subject’s beliefs, captured by the HGF parameter . It is highest when p(outcome|stimulus 1) and p(outcome|stimulus 2) are both equal to 0.5, that is, the sequence of outcomes is totally unpredictable. Irreducible uncertainty is also correlated with the magnitude of surprise (surprise is on average higher in uncertain situations).

Model 3: estimation uncertainty. Uncertainty about the probabilities currently governing the observed outcomes is known as estimation uncertainty. This is represented in the HGF by the variance of the Gaussian distribution representing beliefs at the second level, .

Model 4: volatility uncertainty. Finally, we tested the hypothesis that subjective stress related to uncertainty at the third level, corresponding to uncertainty about the volatility of the generative process. Again, this is explicitly represented in the HGF as the variance of the Gaussian representing beliefs at the third level, .

For physiological stress measures, we convolved predictors with canonical response functions (see below) to account for the time course of physiological responses41,42.

Model comparisons

We performed two sets of model comparisons. In the first, we determined the best learning model to explain predictions made by the subjects, and in the second the best model for subjective stress responses. In each case, for each model and each subject, we took the model evidence (F-values for learning models or Bayesian Information Criterion66 for multiple regression models), and used these to assess model fit by Random-Effects Bayesian Model Selection36, as implemented in the VBA toolbox37. Random-Effects Bayesian Model Selection allows for heterogeneity in the population; the best model for each individual is allowed to vary, producing an estimate of model frequency in the population (that is, for how many participants that model is the best model) and an exceedance probability (the probability that the model in question is the most frequently utilized in the population).

Response functions used in pupillary and skin conductance measures

We used linear modelling to elucidate the impact of different events on pupil diameter and skin conductance. This approach is well-established in the functional magnetic resonance imaging literature, where the use of General Linear Model (GLM) to interpret haemodynamic responses is common.

In this approach, a response function is used to describe how the output of a system (the measured variable) relates to its inputs. This response function is then convolved with a representation of the putative inputs to the system, and the resultant time course is used as a predictor in the GLM. A common further step is to create secondary and tertiary regressors that are convolved with the first and second derivative of the response function65. These ‘dispersion regressors’ allow for inaccuracy in the timing or form of modelled responses; we utilize this approach in our modelling of events (this is unnecessary when the modelled process is extended over time, and represented by a boxcar).

Our response functions were drawn directly from previous studies. For skin conductance, we used the skin conductance response functions provided in SCRalyze (http://scralyze.sourceforge.net/), which are based on a gamma function convolved with a Gaussian kernel67.

Response functions for pupillary dilatation were first discussed by Hoeks et al.68 and we use the response function described there, which takes the form of a gamma function:

The constants n and t max dictate the shape of the response function. As advocated in the original paper, we use values of n=10.1 and t max =930 ms; we note that a recent study using this response function41 demonstrated that this algorithm is robust to changes in the values of the constants used (see the Supplementary Information in ref. 41).

We took an additional step in our modelling of pupillary responses. As the appearance of stimuli and outcomes involved increases in luminance, they evoked light-related pupil constrictions. Importantly, such luminance responses are faster than the dilatations evoked by cognitive or emotional factors31. We accounted for light-related constrictions by convolving stimulus and outcome onsets with a luminance response function, which was fitted to each subject on the basis of a separate data set in which subjects were passively exposed to each of the image used in our experiment (see above and Supplementary Fig. 3). The best fitting parameters were found using least-squared fitting implemented by the Matlab function fmincon. Calculated in this way, average n=3.6 and t max =839 ms.

Statistical analysis

All data analysis was completed in Matlab (Mathworks). All statistical tests were two-sided. We used one-sample t-tests to test for the significance of parameters in multiple regression models, and two-way repeated-measures analysis of variance to analyse the time-course data for skin conductance and pupil diameter. We used Pearson correlation coefficients except in the one case in which the data were a priori not normal, having been fit in a logit space (ϑ; Fig. 2c). In this instance, we confirmed non-normality using a Kolmogorov–Smirnov test, and used Spearman’s Rank to assess the correlation non-parametrically.

Code availability

Custom Matlab code for analysis of skin conductance and pupil diameter is available on request to the corresponding author. We used the HGF toolbox (http://www.translationalneuromodeling.org/hgf-toolbox-v3-0/) for modelling of learning, the VBA toolbox for model comparison (mbb-team.github.io/VBA-toolbox/), and the SCRalyze suite for preprocessing of skin conductance data (http://scralyze.sourceforge.net/).