Abstract There is broad consensus that the prefrontal cortex supports goal-directed, model-based decision-making. Consistent with this, we have recently shown that model-based control can be impaired through transcranial magnetic stimulation of right dorsolateral prefrontal cortex in humans. We hypothesized that an enhancement of model-based control might be achieved by anodal transcranial direct current stimulation of the same region. We tested 22 healthy adult human participants in a within-subject, double-blind design in which participants were given Active or Sham stimulation over two sessions. We show Active stimulation had no effect on model-based control or on model-free (‘habitual’) control compared to Sham stimulation. These null effects are substantiated by a power analysis, which suggests that our study had at least 60% power to detect a true effect, and by a Bayesian model comparison, which favors a model of the data that assumes stimulation had no effect over models that assume stimulation had an effect on behavioral control. Although we cannot entirely exclude more trivial explanations for our null effect, for example related to (faults in) our experimental setup, these data suggest that anodal transcranial direct current stimulation over right dorsolateral prefrontal cortex does not improve model-based control, despite existing evidence that transcranial magnetic stimulation can disrupt such control in the same brain region.

Citation: Smittenaar P, Prichard G, FitzGerald THB, Diedrichsen J, Dolan RJ (2014) Transcranial Direct Current Stimulation of Right Dorsolateral Prefrontal Cortex Does Not Affect Model-Based or Model-Free Reinforcement Learning in Humans. PLoS ONE 9(1): e86850. https://doi.org/10.1371/journal.pone.0086850 Editor: Floris P. de Lange, Radboud University Nijmegen, Netherlands Received: September 11, 2013; Accepted: December 18, 2013; Published: January 24, 2014 Copyright: © 2014 Smittenaar et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: R.J. Dolan is supported by a Wellcome Trust Senior Investigator Award 098362/Z/12/Z. P. Smittenaar is supported by a 4-year Wellcome Trust PhD studentship 092859/Z/10/Z. The Wellcome Trust Centre for Neuroimaging is supported by core funding from the Wellcome Trust 091593/Z/10/Z. J. More information about the Wellcome Trust at http://www.wellcome.ac.uk/. Diedrichsen is supported by a Scholar Award from the James S. McDonnell Foundation. More information about the James S. McDonnell Foundation at http://www.jsmf.org/. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist.

Introduction Electrical stimulation of the human brain has received widespread attention over recent years. It has been used to study the function of healthy cortex [1], connectivity between regions [2], as an avenue for treatment in disorders such as depression, Parkinson’s disease and stroke [3]–[6], and to improve normal function such as in skill learning [7], [8]. Here we used transcranial direct current stimulation (tDCS), a technique whereby two electrodes are placed on the skull and a fixed current level is applied [9]. This technique is reported to increase and decrease the excitability of the neural tissue underlying the anodal and cathodal electrode respectively [8], [9]. A number of studies have suggested that high-level cognition can be improved by anodal stimulation of the prefrontal cortex. Specifically, stimulation of the dorsolateral prefrontal cortex (dlPFC) has been shown to decrease risk-taking [10], improve working memory [11], [12] and improve classification learning [13]. We attempted to influence the process of decision-making through anodal stimulation of the right dlPFC. Decision-making is often dissected into a slow, deliberative, goal-directed component and a fast, automatic, habitual component [14]–[16]. In value-based choice, such a distinction is made as model-based versus model-free control [15], [17]. A model-free system learns a cached value for each action based on reward prediction errors and guides behavior based on these alone, trading a minimum of computational effort against a relative lack of flexibility in adjusting to current goals. Model-based control, by contrast, dynamically computes optimal actions by forward planning, a process that is computationally demanding but allows for flexible, outcome-specific behavioral repertoires [15]. We focused on the right dlPFC based on evidence for its role in model-based processes such as the construction and use of associative models [18]–[20] and the coding of hypothetical outcomes [21]. Work on non-human primates also implicates the dlPFC as a site for convergence of reward and contextual information [22]. Furthermore, we recently showed that right, but not left, dlPFC is necessary for model-based control, evidenced by a reduction in model-based control after disruptive theta-burst transcranial magnetic stimulation to the right dlPFC [23]. Here we sought to enhance, rather than disrupt, model-based control through anodal stimulation. We used a task which has been shown to quantify model-based and model-free control [24]–[26] and tested participants undergoing anodal or Sham tDCS stimulation to the right dlPFC in a double-blind, counterbalanced design. We hypothesized that anodal stimulation would improve model-based control without affecting model-free control, an effect driven by an enhancement of a component process of model-based control subserved by the right dlPFC.

Materials and Methods We recruited 23 healthy participants to participate in an experiment over 2 sessions. All participants had normal or corrected-to-normal vision and no history of psychiatric or neurological disorders. One participant was excluded from analysis due to failed stimulation after an increase in resistance from drying electrodes, leaving 22 participants (11 female, mean age ± SD: 22.5±5.3 years, all participants were at least 18 years of age at the time of consent) for analysis. Ethics Statement Written informed consent was obtained from all participants prior to the experiment and the UCL Research Ethics Committee approved the study (project number 3450/003). Setup of Experiment and Double-blinding Procedure Participants were tested on 2 occasions between 3 and 8 days apart, going through the same procedure on each day: after obtaining informed consent we determined the electrode locations, explained the task, guided participants through a short practice session, placed the electrodes on the scalp, turned on stimulation, and started the task. The experiment was double-blind, with both experimenter and participant unaware of the stimulation condition (Active or Sham). This was achieved through a system of blinding codes embedded in the stimulation machine (NeuroConn, Germany). First, co-author GP selected 24 pairs of 5-digit codes, each pair containing one code associated with Active and one code associated with Sham stimulation as programmed into the stimulation machine. These were then permuted such that half the pairs had Active stimulation on session 1 and Sham stimulation on session 2, whereas the other half of pairs had the reversed order. GP kept the unblinded version of the codes and handed the permuted set to PS, who acquired the data. Each participant was assigned a pair in order of testing date. When the participant was prepped for stimulation, their session-specific code was entered into the stimulation machine, which then administered the corresponding Active or Sham protocol without any indication as to the stimulation condition. We tested the participant’s awareness of the stimulation condition at the end of the experiment (see below). PS was deblinded after acquisition of all 23 datasets. Task The task design was based on Daw et al. [24] and identical to Wunderlich et al. [25] except for faster trial timings and a larger number of trials. The task was programmed in Cogent 2000 & Graphics (John Romaya, Wellcome Trust Centre for Neuroimaging and Institute of Cognitive Neuroscience development team, UCL) in Matlab (The Mathworks Inc). Each trial consisted of two choice stages. Each choice stage contained a 2-alternative forced choice, with choice options represented by a fractal in a colored box on a black background (Figure 1A). On each choice participants had to respond within 2 seconds using the left/right cursor keys or the trial was aborted and not rewarded. Missed trials were omitted from analysis. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 1. Two-step task design. (A) On each trial a choice between two stimuli led probabilistically to one of two further pairs of stimuli, which then demanded another choice followed probabilistically by reward or no-reward. Participants could learn that each first-stage stimulus led more often to one of the pairs; this task structure could be exploited by a model-based, but not by a model-free controller. (B) Model-based and model-free strategies for reinforcement learning predict differences in feedback processing after uncommon transitions. If choices were exclusively model-free, then a reward would increase the likelihood of staying with the same stimulus on the next trial, regardless of the type of transition (left). Alternatively, if choices were driven by a model-based system, the impact of reward would interact with the transition type (middle). https://doi.org/10.1371/journal.pone.0086850.g001 Choice at the first stage always involved the same two stimuli. After participants made their response the rejected stimulus disappeared from the screen and the chosen stimulus moved to the top of the screen. After 0.5 s one of two second stage stimulus pairs appeared, with the transition from first to second stage following fixed transition probabilities. Each first stage option was more strongly (with a 70% transition probability) associated with one of the two second stage pairs, a crucial factor in allowing us to distinguish model-free from model-based behavior (see below). In both stages the two choice options were randomly assigned to the left and right side of the screen, forcing the participants to use a stimulus- rather than action-based learning strategy. After the second choice the chosen option remained on the screen together with a reward symbol (a pound coin) or a ‘no reward’ symbol (a red cross). Each of the four stimuli in stage two had a reward probability between 0.2 and 0.8. These reward probabilities drifted slowly and independently for each of the four second stage options through a diffusion process with Gaussian noise (mean 0, SD 0.025) on each trial. Three random walks were generated beforehand and randomly assigned to sessions. We chose to pre-select random walks as otherwise they might, by chance, turn out to have relatively static optimal strategies (e.g. when a single second-stage stimulus remains at or close to p(reward) = 0.8). Prior to the experiment participants were explicitly instructed that for each stimulus at the first stage one of the two transition probabilities was higher than the other, and that these transition probabilities remained constant throughout the experiment. Participants were also told that reward probabilities on the second stage would change slowly, randomly and independently over time. On both days, participants practiced 50 trials with different stimuli before starting the task. The main task consisted of 350 trials with 20 s breaks every 70 trials. The participant’s bonus money in pounds sterling was the total number of rewarded trials minus 170, divided by 5. Added to this money was a flat rate of £7/hour. Analysis We analyzed stay-switch behavior on the first choice of each trial to dissociate the relative influence of model-based and model-free control. A model-free reinforcement learning strategy predicts that choices followed by rewards will lead to a repetition of that choice, irrespective of whether it followed a common or uncommon transition (Figure 1B, left). This is because model-free choice works without considering structure in the environment. A reward after an uncommon transition would therefore adversely increase the value of the chosen first stage cue without updating the value of the unchosen cue. In contrast, under a model-based strategy we expect an interaction between transition and reward, because a rare transition inverts the effect of a subsequent outcome (Figure 1B, middle). Under model-based control, receiving a reward after an uncommon transition increases the propensity to choose the previously unchosen first-stage stimulus. This is because the rewarded second stage stimulus can be more reliably accessed by choosing the rejected first stage cue than by choosing the same cue again. To summarize, this analysis quantifies model-free behavior as the strength of the main effect of reward, and model-based behavior as the strength of the reward by transition interaction, even when actual behavior is a hybrid of model-free and model-based control (Figure 1B, right). Whereas most studies using this task have only looked at the preceding trial to explain choices on the current trial [24]–[26], here we expanded on this approach to examine model-based and model-free influences that go up to 3 trials back. This provides a more fine-grained dissection of the influences of each system on behavior. We used hierarchical logistic regression implemented in lme4 [27] in the R software package. The dependent variable for trial t was 1 when stimulus A was chosen and 0 when stimulus B was chosen in the first stage. Each regressor then described whether events on trial t-1, t-2, and t-3 would increase (coded as +1) or decrease (coded as −1) the likelihood of choosing A according to a model-based or model-free system. If a trial contained a common transition the model-based and model-free system would make identical predictions, whereas on trials with uncommon transitions these predictions would be inverted. We additionally modeled the main effect of transition type (common as +1, uncommon as −1) on trial t-1, t-2 and t-3, which we predicted would have no effect on the propensity to choose stimulus A. We also tested 3 alternative models that used 1) one set of model-based regressors for both conditions, 2) one set of model-free regressors for both conditions and 3) one set of model-based and one set of model-free regressors for both conditions (‘null model’). These models allowed us to test whether the additional complexity of having separate regressors for the stimulation conditions was appropriate. These models were compared using the BIC and AIC values provided by the lme4 package. We estimated coefficients for the regressors shown in Table 1, taking all coefficients as random effects over participants. That is, the regression model is fit to each participant’s data while simultaneously maximizing the likelihood of the parameters across the population. This method accounts for both within- and between-subject variance, providing unbiased estimates of the population coefficient for each regressor. This hierarchical approach is different from the more common approach whereby a full model is fit to each participant separately, and statistics are performed on the parameter estimates. The latter ignores within-subject variance and is only concerned with variance between subjects (i.e. random effects). PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Table 1. Regressors in the full model for first-stage choices. https://doi.org/10.1371/journal.pone.0086850.t001 We then performed contrasts over the population coefficients to test for differences between conditions in model-free and model-based control. All p-values reported in the manuscript that pertain to the logistic regression were estimated using the “esticon” procedure in the “doBy” package which relies on the chi-square distribution [28]. Power analyses were performed using the Matlab 7.12.0 ‘sampsizepwr’ function and G*Power 3.1.7 [29], [30]. Other tests were performed in SPSS 17.0. Stimulation On both sessions the anodal electrode was placed over right dlPFC and the cathodal electrode over the inion. The inion was chosen for cathodal electrode placement in order to maximize current flow through the dlPFC. The right dlPFC was located using the 10/20 system, which is appropriate given the limited level of spatial resolution of tDCS [31]. In brief, we first located Fpz, Fz and Oz as 10%, 30% and 90% of the nasion-inion distance, measured from the nasion. We then located F8 as 30% of the distance between Fpz and Oz, measured from Fpz passing over the ears. Electrode F4, commonly used for the right dlPFC [31], was then determined as 50% of the distance between F8 and Fz. We used conductive rubber electrodes inserted in a sponge cover measuring 7.5 by 6 cm, secured to the head using a bandage. We placed the electrode along the gyrus, i.e. the electrode was placed in superior-medial to inferior-lateral direction. We used a DC-stimulator system (NeuroConn, Germany). In the Active condition a 2 mA current was delivered for 25 minutes with 15 s ramping-up and ramping-down. In the Sham condition the current ramped up then down over 15 s, and then performed continuous impedance testing. This manipulation made it very hard for the participant to tell which type of stimulation was given at what time. We confirmed this by giving a 2-alternative forced-choice at the very end of the experiment asking which session contained the Active stimulation. This test showed that participants as a group were not significantly different from chance at determining the session that contained Active stimulation (10 out of 22 participants guessed correctly, binomial test, p = .83). We employed a number of post-hoc checks to safeguard against experimental error. Firstly, we monitored the resistance reported by the DC-stimulator throughout the experiment, rejecting one participant for whom stimulation was stopped after a strong increase in resistance (>55 kΩ). Secondly, after the experiment we confirmed for a random set of 4 sham and 4 active codes that they were correctly linked to the sham or active stimulation procedure by examining the current with an amperometer. This was the case for all 8 codes. Thirdly, we note that of the 100,000 possible codes that can be entered into the DC-stimulator only 200 are allowed, minimizing the possibility of erroneously entered codes. After turning on stimulation the participant waited for 10 minutes before starting the task in order to ensure the effects of stimulation were fully established [9]. Altogether participants received 25 minutes of stimulation at 2 mA. It is known that cortical excitability changes outlast such stimulation durations by over an hour ([9], though see [32]). The window of stimulation therefore need not fully overlap with the task, and in our design stimulation ended approximately halfway through the task. It should be noted that choices for stimulation parameters are based on studies of motor cortex stimulation. It is possible that these parameters, when used on frontal areas, have different effects. To our knowledge there is no published data on this, though we note our protocol is similar to that of other studies using tDCS on dlPFC [10], [13].

Discussion Here we provide evidence that tDCS to right dlPFC does not affect model-based or model-free control in an established behavioral paradigm. In a double-blind design we confirmed that participants used both model-free and model-based strategies to solve the task, and we could quantify the extent to which either strategy was used. A putative enhancement of right dlPFC activity through Active compared to Sham anodal tDCS stimulation did not significantly change the level of model-based or model-free control. Formally testing this null effect, we provide evidence that a null model predicting no effect of stimulation performed significantly better than more complex models predicting an effect of stimulation on model-based control, model-free control, or both. We hypothesized that an enhancement of right dlPFC would improve model-based control, similar to beneficial tDCS effects observed on risk taking [10], probabilistic learning [13] and working memory [11]. Based on published tDCS studies and studies of model-based control, we estimated our study had more than 60% statistical power to detect such an effect were it to exist. Although our power was potentially lower than the often cited 80% power standard (e.g. [37]), it was considerably higher than >75% of neuroscience studies as determined recently in a meta-analysis [35]. Despite this, we observed a null effect of tDCS on model-based control. However, frequentist statistics do not allow us to conclude the null hypothesis was a significantly better explanation than the alternatives in which stimulation does have an effect. We therefore performed a complementary model comparison using information-theoretic measures to formally show this [38]. Together, these analyses support our conclusion that tDCS to right dlPFC has no effect on model-based or model-free control. There is a modest literature on improvement in cognition through tDCS of the right dlPFC, and this begs the question why no effect was found in our experiment. This is even more surprising because the dlPFC is implicated in model-based processes [18]–[22] and when the region is transiently disrupted using transcranial magnetic stimulation, model-based control is selectively impaired [23]. Here we speculate that our null result is most likely due to an inability of tDCS to improve the specific component processes of model-based control subserved by the dlPFC. Firstly, little is known about the physiological effects of tDCS in prefrontal cortex [39], though this is a rapidly developing field [32]. While there is evidence that anodal stimulation over M1 increases the motor evoked potential (MEP) size elicited by TMS [40], it is not clear how the cellular physiology of the dlPFC is changed following anodal stimulation, nor what the physiological underpinnings of model-based control in the dlPFC are. Despite these unknowns, we suggest here that the neural mechanisms for model-based control in right dlPFC are not amenable to improvement through anodal tDCS. Secondly, we used a task to assess model-based control that has previously been shown to be susceptible to manipulation [23], [25], [26], we used a set of stimulation parameters that are widely used in the tDCS community [41], and we replicated previous observations of dual control by model-based and model-free systems. Together, this suggests our null result is not due to the introduction of uncertain elements (e.g. novel task or novel stimulation parameters) into the study design. Despite the use of established methods, we cannot exclude methodological issues as the cause of the null effect altogether. Although we are confident the null effect is not due to faulty equipment or errors in the double-blinding procedure (see Methods), potential other issues might include inaccurate electrode placement, a problem that can be alleviated by stereotactic navigation using anatomical scans as commonly used in transcranial magnetic stimulation [42], and unpredictable current flow based on electrode placement, which might be alleviated by computational models of current flow [43]. We were particularly careful to employ a double-blinded design to eliminate any stimulation-dependent influence from the experimenter on task performance. The task used here requires relatively extensive involvement of the experimenter in the task instructions. In a double-blinded design, then, these effects can be most reliably attributed to the experimental manipulation of interest rather than to unintended information biases [44]. We note that no published work has manipulated the instruction of the 2-step task to examine its influence on model-based and model-free performance. In conclusion, we provide evidence that anodal stimulation of the right dlPFC by tDCS does not alter model-based or model-free control in our paradigm. This observation was made in the context of extensive and causal evidence for a role of right dlPFC in model-based control in humans. As such, our results should not be interpreted as providing evidence that the right dlPFC is not involved in model-based control; rather, our main finding is that anodal stimulation does not necessarily enhance this function. An open question is whether tDCS might improve performance on tasks that are more taxing on the model-based system (e.g. [45]).

Author Contributions Conceived and designed the experiments: PS GP JD RJD. Performed the experiments: PS GP. Analyzed the data: PS THBF. Contributed reagents/materials/analysis tools: GP JD. Wrote the paper: PS GP THBF JD RJD.