Subjects

Prior to the commencement of data collection, the full protocol was approved by the Human Research Ethics Committees of the Royal Brisbane & Women’s Hospital, the University of Queensland, the QIMR Berghofer Medical Research Institute and UnitingCare Health. All research was performed in accordance with relevant guidelines and regulations. All participants gave written, informed consent to participate in the study.

Thirty-eight participants with PD undertaking STN-DBS were consecutively recruited at the Asia-Pacific Centre for Neuromodulation in Brisbane, Australia. All participants met the UK Brain Bank criteria for PD45. No participants met the Movement Disorder Society criteria for dementia46. The PD subtype and the Hoehn and Yahr stage at device implantation was recorded47. Patients underwent bilateral implantation of Medtronic 3389 or Boston Vercise electrodes in a single-stage procedure. Stimulation was commenced immediately using microelectrode recording data to identify the optimal contact. Further contact testing took place over the following week as an inpatient, with participants returning to the DBS centre following discharge for further stimulation titration, guided by residual motor symptoms. Further details have previously been reported48,49.

Neuropsychiatric assessment

Impulsivity amongst participants was assessed with patient and clinician-rated instruments prior to STN-DBS and subsequently 2 weeks, 6 weeks, 13 weeks and 26 weeks postoperatively (see Supplementary Fig. 1 for a study flowchart). A range of measures were obtained, to account for the fact that impulsivity is not a unitary construct. These included the Barratt Impulsiveness Scale 11 (BIS) and second-order factors attentional, motor and non-planning18; the Questionnaire for Impulsive-Compulsive disorders in PD Rating Scale (QUIP-RS)50; the delay discounting task51; the Excluded letter fluency task (ELF)52; and the Hayling test53. Further information on these instruments is detailed in the Supplementary Information. Additional neuropsychiatric symptoms were captured with the Beck Depression Inventory II (BDI)54; the Empathy Quotient (EQ)55; the Geriatric anxiety inventory (GAI)56; and the Apathy Scale57. For each self-report scale, participants were instructed to refer to ‘the last two weeks’, in order to obtain a measurement of current ‘state’. At each visit, PD motor symptoms were assessed using the UPDRS Part III motor examination58. Dopaminergic medication was recorded and converted to a levodopa-equivalent daily dose (LEDD) value59.

Design and setting

Participants completed the experimental task prior to DBS and at 13-weeks post-DBS. Participants were ‘on’ medication and stimulation for all assessments. We opted against a counterbalanced ‘off’ and ‘on’ DBS assessment at the same visit for several reasons. First, our aim was to provide a naturalistic insight into the subtle behavioural changes that emerge as patients transition from dopaminergic therapies to subthalamic stimulation; changes in levodopa equivalent daily dose were included as co-variates in our analyses. Second, our experience is that many patients would not tolerate the DBS ‘off’ state without severe discomfort. Thirdly, despite allowing DBS washout, plastic network effects of chronic DBS may persist and contaminate findings in an on-off design.

Task

We employed a modified version of an established slot machine gambling paradigm validated in healthy controls19. Subjects read an instruction screen and played through 5 training trials, after which they entered a ‘virtual’ casino, starting with 2000 AUD available to gamble and playing 100 trials (Fig. 1). The win-loss likelihood of the slot machines was predetermined and changed at regular intervals. On completion of the task, participants received a small monetary reward proportional to their total winnings. The naturalistic gambling task allows for risk-taking and impulsive behaviour to be expressed and offers four actions on each trial, each of which reflect exploration, and thereby, risk-taking.

(i) Bet Increase: increasing the amount wagered on consecutive trials (minimum of 5 AUD per bet, no maximum) (ii) Machine-Switch: switching between slot machines (four machines in total) (iii) Casino Switch: cashing out and switching ‘virtual’ casino days (iv) Double-Up: engaging in a secondary double-or-nothing gamble on all win trials

Figure 1 Slot machine gambling paradigm: The task consists of 100 trials. On every trial, players are able to place a bet of unlimited magnitude, switch slot machines or ‘cash out’, exiting the casino and returning again on another virtual ‘day’. The overall win probability is 25%, with wins split into big wins and small wins. The two possible types of losses are near-misses, in which the first two wheels are the same and the third is different (i.e. AAB) or a true loss, in which all the wheels are different (i.e. ABC). Game play proceeds as follows. Each trial begins with the slot machine main screen loading, displaying the player’s account value. The player then places a continuous-valued bet amount, incremented in units of 5 or 10 AUD. After the player has placed a bet, he or she presses the ‘Pull’ button and watches as the wheels begin to spin. At any point, the player has the ability to press the ‘Stop’ button, ending the trial and subsequently revealing the outcome of the three wheels. Unbeknownst to the participant, pressing the stop button has no effect on the trial outcome. If the stop button is not pressed, the trial times out after 5 seconds, and the player sees the outcome of the first, second and third wheel sequentially. On trials in which the outcome is a win, there are ten possible reward grades (or multiples of the bet amount). After every win trial, players are offered a possible ‘double-up’ option, during which players are given 3 seconds to decide whether or not to engage in a ‘double-or-nothing’ option, thereby risking his or her entire win amount. If the player elects to engage in this gamble, a card flips over revealing the result, and subjects are taken to the next trial. If the player does nothing, or decides not to gamble, he or she is taken to the next trial. For each loss trial, players are taken directly to the beginning of the next trial. Again, the trajectory of win-loss outcomes is fixed, ensuring comparable inference upon perceptual and response parameters across participants. Full size image

As in our previous work19, these responses, together with trial-wise outcome information (wins/losses), served as the input for our computational models (for a brief summary, see below). Details on the paradigm and computational modelling can be found in prior work19 and the Supplementary Material.

Computational modelling

The hierarchical gaussian filter (HGF)

The HGF is a hierarchical Bayesian model34,35 (Fig. 2) where each level of the hierarchy encodes distributions of environmental variables (in ascending complexity) that evolve as Gaussian random walks. The HGF is an extension of the model presented in Behrens et al.33, and describes an agent whose learning rate is a function of his or her uncertainty. In the HGF, an agent is assumed not only to represent current environmental contingencies, but also to track how these contingencies change over time (volatility), and to what degree volatility itself is constant (tonic volatility) or may change in time (phasic volatility). Importantly, the agent modelled in the HGF employs these representations to make predictions about emerging environmental fluctuations and future sensory feedback. Furthermore, the agent is able to encode the precision of each prediction and use these precision estimates to scale trial-wise updates of beliefs about the environment and its statistical structure. Each level of the HGF is coupled such that higher states determine how quickly the next lower state evolves, with the lowest hierarchical level representing sensory events.

Figure 2 The Hierarchical Gaussian Filter (HGF): u(k) represents binary observations (true wins = 1, and losses = 0, in the case of the slot machine). Binary inputs are represented on the first level, \({x}_{1}^{(k)}\) via a Bernoulli distribution, around the probability of win or loss, \({x}_{2}^{(k)}\). In turn, \({x}_{2}^{(k)}\) is modelled as a Gaussian random walk, whose step-size is governed by a combination of \({x}_{3}^{(k)}\), via coupling parameter κ, and a tonic volatility parameter ω. \({x}_{3}^{(k)}\) also evolves as a Gaussian random walk over trials, with step size ϑ (meta-volatility). In this investigation, after observing trial-wise outcomes (win or lose), the gambler updates her belief about the probability of win on a given trial k \(({x}_{2}^{(k)})\), as well as how swiftly that slot machine is moving between being ‘hot’ (high probability of win) or ‘cold’ (low probability of win) \({x}_{3}^{(k)}\). On any trial, the ensuing beliefs then provide a basis for the gambler’s response, which may be to increase the bet size, ‘double up’ after a win, switch to a new slot machine or leave the casino. Full size image

Inversion of this ‘perceptual model’ produces subject-specific parameter estimates that determine the nature of the coupling between levels of the HGF. Inverting this model under generic (mean-field) approximations results in analytical belief-update equations, in which trial-wise belief updates are proportional to prediction errors (PEs) weighted by uncertainty (or its inverse, precision). The subject-specific parameters shape an individual’s approximation to ideal Bayesian inference, specifically how phasic and tonic volatility impacts trial-wise estimates of uncertainty at all levels of the hierarchy. Posterior estimates of HGF parameters can thus be regarded as a compact summary of an individual’s uncertainty processing during an experiment.

Furthermore, in a ‘response model’, trial-wise beliefs are probabilistically linked to observed trial-wise decisions. Inverting both perceptual and response models allows for estimating the parameters; this corresponds to Bayesian inference (of an observer) on Bayesian inference (of an agent)24. An informal description is given below and a formal summary is provided in the Supplementary Material.

The perceptual model

The HGF is used to infer how an individual subject learns about hierarchically-coupled environmental quantities under different forms of uncertainty (including volatility). In our case, the lowest level of the HGF, x 1 , represents the trial-wise binary outcome (win or loss) in the slot machine. This derives from a sigmoid transformation of x 2 representing winning probability in logit space (i.e., whether the machine is currently ‘hot’ or ‘cold’ and likely (or not) to pay out). x 2 evolves as a Gaussian random walk whose step size is a function f 2 (x 3 ) of a third-level variable, x 3 , which performs a Gaussian random walk of its own. x 3 represents the slot machine’s ‘volatility’, the speed at which it fluctuates between ‘hot’ and ‘cold’ states. The coupling function f 2 between levels, contains subject-specific parameters κ and ω that determine an individual’s approximation to ideal Bayesian inference. Finally, the parameter ϑ at the highest level denotes how quickly volatility itself is changing (meta-volatility). A detailed derivation of the exact equations can be found in Mathys et al.34.

More concretely, in the context of our study, parameters ω and ϑ at the second and third level of the hierarchy, respectively, encode different aspects of subjective estimates of uncertainty. Specifically, these estimates concern environmental uncertainty, i.e., hidden fluctuations (volatility) of environmental states (for details, see Mathys et al.)34. These volatility estimates are potentially important for explaining the observed behaviour because they shape participants’ belief updates about the slot machine and their ensuing choices about gambling. Parameter ω represents a subject’s estimate of tonic volatility, i.e., how quickly a slot machine could be moving from a state where it is likely to pay out (running ‘hot’) to a state where it is not (running ‘cold’) and vice versa. Parameter ϑ encodes a subject’s estimate of meta-volatility, i.e., the tendency of volatility itself to change over time. Larger values of each parameter correspond to greater uncertainty in the subject’s perceptual inference process.

The response model

The response model maps a subject’s beliefs (obtained by inverting the perceptual model under given parameter values) to observed gambling behaviour. Here, we use a sigmoidal response model34; if this function is steep, there is a close relationship between current perceptual beliefs and betting behaviour. Conversely, a gentler sigmoidal slope results in a more stochastic mapping of beliefs to behaviour. This response function has a parameter, β, the decision ‘temperature’ (also known as the inverse temperature), that determines the steepness of the sigmoid and thus the degree of stochasticity in the belief-to-choice mapping. The larger the value of β, the steeper the function, and the more deterministic is the relationship between a subject’s belief and their actions. In this paper, we test the following two variants of this response model:

(i) ‘Standard’ HGF: β = constant, i.e., the mapping from beliefs to behaviour is fixed across the experiment. This parameter is estimated for each subject. (ii) ‘Uncertainty-driven’ HGF: \(\beta =1/{\sigma }_{2}^{(k)}\), where \({\sigma }_{2}^{(k)}\) is the variance of the inferred probability of win on trial k. That is, the response behaviour dynamically adapts to the precision of the subjects’ belief about the current probability of winning.

Perceptual variable

Based on previous work that examined different computational models of our slot machine paradigm19, the perceptual variable used here was simple: a binary variable in which wins were represented by 1 and losses by 0. This binary representation of win or loss in the task allows for increased interpretability of model parameters in measuring uncertainty-updating and impulsive-responding in reaction to a binary win/loss outcome.

Response variable

The response variable is a binary representation of actions associated with risk taking. It is constructed using a logical OR operator on four choices during the slot machine paradigm: bet increases, machine switches, double-ups and casino switches. For each trial, the response variable takes a 1 when any of these four events occur, and 0 otherwise. For details, please see Supplementary Table 1.

While these actions might at first glance appear to relate to different behaviours, they all share a common theme in that they enhance outcome variance and thus the amount of risk the player takes in the game. For example, increasing bet size from one trial to the next results in higher reward variability in the trial outcome, thereby making the player more susceptible to larger wins and losses. In aggregate, these four actions relate to a player’s risk-taking tendencies (described further in Supplementary Section 1.4).

Reinforcement learning

As an alternative model, we used a classical associative learning model, Rescorla-Wagner (RW), often used in reinforcement learning (RL)60. The RW model updates the probability of a win on trial k by combining the probability on trial k−1 with a PE weighted by a constant learning-rate. Hence, in contrast to the HGF, the RW model does not have a dynamic learning rate over trials, nor can it account for different forms of perceptual uncertainty—essentially, the RW model corresponds to an HGF with a fixed learning rate. Here, we combine the RW learning rule with the same sigmoidal response model described above, with free parameter β, that we estimate on a subject-specific basis. This results in a model that is (i) structurally not dissimilar but less complex than the HGF and (ii) almost identical to the RL model used in a prior investigation of learning after STN-DBS17.

Model inversion

The HGF and RW models were inverted using population Markov-Chain Monte Carlo (MCMC) sampling61. Parameter estimation in the HGF is classically ‘fully Bayesian’ and requires a selection of priors, which influence parameter estimation to a lesser or greater degree. In order to minimise this influence, we used a novel empirical Bayesian inference scheme for the HGF where a Gaussian group-level distribution of parameters is constructed from samples across the group. This group-level empirical prior is then used to obtain posterior parameter estimates in each subject (Supplementary Fig. 2). Subject-specific point estimates for model parameters are calculated as the median value of the subject’s posterior distribution.

Given the clinical constraints of our investigation (to reduce any burden on the participants, we only used 100 trials per episode of gambling, i.e., only half as many as in our previous work)19, it was important to ensure that our parameter estimates were robust. Therefore, in order to verify that HGF parameter estimates reliably reflected subject-specific characteristics of uncertainty encoding and decision noise, we tested our ability to recover ground-truth parameter values from simulated response data. In order to assess parameter recoverability, we used three parameter values per parameter (shown in Supplementary Fig. 4) and generated a batch of 38 synthetic response variables based on these assigned values, using the underlying trace of the slot machine as the perceptual variable. We then inverted the HGF and explored the relationship of the recovered parameter estimates, using the median of the posterior, with the ground truth values. When estimating ω and ϑ, β was held fixed; conversely, when estimating for β, ω and ϑ were fixed. This process was repeated for 10 batches across each parameter.

Model comparison

As described above, we considered two competing hypotheses of how subjects might incorporate uncertainty into their choice of actions, i.e., two different belief-to-choice mappings in the response model for the HGF (the ‘Standard’ and ‘Uncertainty-driven’ models). These two versions of the HGF were compared with the RW model. As we were primarily interested in the pre-DBS to post-DBS change, we selected the winning model for the pre-DBS measurements. We then evaluated if the parameter estimates of that winning model changed postoperatively. Estimates of the negative free energy (log model evidence) were computed using thermodynamic integration61. The negative free energy balances goodness of fit with a complexity penalty. Group-level free energy estimates were compared to select a winning model.

Data analysis

General considerations

All computational modelling and model inversion was performed using MATLAB (Mathworks), employing custom scripts developed from the HGF toolbox version 3 in the open source software TAPAS (http://www.translationalneuromodeling.org/tapas/). Multiple regression analyses were performed using the regstats function in the MATLAB Statistics Toolbox. For all analyses involving multiple comparisons, native p-values are presented, accompanied by Holm-Bonferroni correction at α = 0.05. To test the significance of individual regressors in multiple regression models, post hoc t-tests were performed.

Neuropsychiatric assessment data from baseline, prior to DBS, was compared with data gathered at 13-weeks post-DBS, when the gambling task was repeated. To test for differences in pre-DBS and post-DBS questionnaire scores and model parameter estimates, a paired t-test was employed when the data were normally distributed and the Wilcoxon signed-rank test otherwise, where distribution was assessed using the Lilliefors test. Gambling behaviours (such as bet increases and machine switches) were also compared at both intervals. Gambling behaviours were regressed against clinical measures of impulsivity to determine significant relations. After determining the winning computational model, model parameter estimates were extracted for each participant and regressed against clinical measures of impulsivity to determine significant associations and predictors of postoperative impulsivity. Based on this previous work showing a significant association between BIS scores and both slot machine behaviour and HGF-based estimates of uncertainty encoding19, we focused our analyses on the BIS and its subscales. Perceptual model parameters were extracted in log space: ω and β are naturally estimated in log space, since they are part of exponential terms in their respective equations (see equation 5 in the supplementary material).

From a clinical perspective, we were interested in examining whether changes in the computational characterisation of individual uncertainty estimates pre- to post-DBS were associated with clinically-relevant changes in impulsivity at any time point after DBS. Our strategy to attempt prediction of clinical outcomes follows the ‘generative embedding’ approach, in which individual predictions are not derived from measured data but from parameter estimates obtained by a generative model62,63. Importantly, stimulation-dependent changes in impulsivity may evolve in an unpredictable manner subsequent to DBS, related to variations in DBS programming over time (with considerable adjustments to stimulation in the first six postoperative months). Furthermore, the optimal BIS cut-off score for clinically-significant impulsivity varies by age and disease64, with only one existing investigation specific to a PD cohort65. Therefore, we examined whether individual changes in parameter estimates associated with the maximum postoperative increase in impulsivity, as measured by the BIS, compared to baseline, across six months of longitudinal follow up.