Experimental design

The overall objective of the study was to assess whether VCVS DBS modulated human cognitive control capabilities and cortical neural oscillations relevant to those capabilities. This was a within-subjects design, where individual subjects completed identical measurement protocols with stimulation on and off. This design increased statistical power (compared with alternate designs where DBS subjects would be compared against non-implanted controls) by admitting hierarchical/mixed modeling and controlling for the substantial heterogeneity of treatment-resistant psychiatric patients. Our pre-specified hypotheses were that:

1. DBS would improve human cognitive control, reflected in increased performance in the DBS ON condition. 2. DBS would augment the power of theta oscillations, primarily in lateral prefrontal cortex and dorsal anterior cingulate cortex, given the specific role of these oscillations in decision-making and response inhibition. 3. The degree of DBS-induced change in the above propositions would explain part of its mechanism of action, as determined by prediction of the clinical outcome.

These analyses were not pre-registered. At the time of data collection and study conception, pre-registration was not a widely available service.

Subjects

Fourteen subjects with VCVS DBS consented to participate in the experiments. All had received VCVS DBS implants for a prior clinical study (NCT00640133, NCT00837486, or NCT00555698), with entry criteria given in48,49. All were right handed. The sample included six males and eight females, aged in their 30s–70s at time of data collection, with a minimum of 6 months’ exposure to chronic stimulation and a maximum of 7 years. Subjects had predominantly been implanted for MDD, but 2/14 had a primary indication of OCD with comorbid MDD. Most had at least partial clinical response to DBS. Informed consent for study participation was obtained by a physician who was not the subject’s primary DBS clinician, after the full nature and possible consequences of the study were explained. All study procedures comport with applicable governmental and institutional ethical guidelines. Study procedures were reviewed and approved by the Massachusetts General Hospital Institutional Review Board.

Experimental protocol

To probe cognitive flexibility, we employed a modified version of the MSIT (main text Fig. 1a). The MSIT requires subjects to identify which of a set of three numbers is different than its neighbors. Subjects must keep three fingers of their right hand positioned over response keys corresponding to the digits 1–3. In control (non-interference) trials, the target is in the same spatial position as its corresponding response key, and the flanking digits are not valid responses (i.e., they are 0s). In interference trials, the target is out-of-position relative to its corresponding key-press and is flanked by other viable targets. MSIT has been shown to produce robust functional magnetic resonance imaging (fMRI)25 and electrophysiologic26 changes, with a significant (interference–control) difference often detectable at the individual subject level. We note that this specific operationalization of cognitive control, performance on a conflict task, is only one of many possible experimental approaches. Cognitive control is evoked in many situations, including approach-avoidance conflict50, switch-stay decisions16,51, and possibly also in emotionally valenced self-regulation52. The specific advantage of MSIT is that it is verified to induce statistically robust subject-level effects, at both the behavioral and neural level, amplifying our power to detect DBS-induced differences. We further added an emotional interference dimension, based on a hypothesis that subjects with severe treatment-resistant illness would be attentionally biased towards negative pictures. Before each MSIT trial, an image selected from the International Affective Picture System, or IAPS53, was presented. The image remained on-screen, partially obscured by the MSIT stimulus, for the trial duration. A fixed subset of 144 images were selected from the overall IAPS dataset to cover the range of available valence (positive, neutral, and negative) and emotional arousal ratings.

Each block of trials contained 72 control and 72 interference trials. We assigned positive, neutral, and negative IAPS images assigned to each trial type in a counterbalanced fashion, such that each image was presented once in a control and once in an interference context. The 144 images were split between these two 144-trial blocks in a manner that minimized the mean squared pairwise differences between image ratings when rank ordered by their valence. To prevent response sets or habituation, trial sequence in each block was pseudo-randomized so that subjects never had more than two trials in a row that shared the same valence, interference level, or desired response finger. This highly interleaved trial design was expected to place greater demands on cognitive control systems by reducing predictability of the stimuli. As shown in Fig. 1a, subjects viewed the IAPS picture alone for 400 ms, were presented with the MSIT stimulus and given up to 1500 ms to respond, and then viewed a fixation cross for 3–5 s (randomized with a uniform distribution). They were instructed to minimize eye blinking during the trial and to blink freely during the fixation period. Before data collection, subjects performed a block of 20 trials where they received correct/incorrect feedback, followed by another block of 40 trials without feedback. They repeated this practice, if necessary, until they achieved over 90% correct responses (counting missed trials as incorrect).

Many of our subjects had prior negative life experiences with specific associations to themes presented in IAPS. To control for these strong subjective/idiosyncratic interpretations in this small sample, we collected individual image ratings. After each block was complete, subjects were re-presented with each IAPS image and given 25 s to rate the image emotionally. We used the same self-assessment manikin system originally used to develop the IAPS54, which assigns each picture a valence rating from 1 to 9 (representing most negative to most positive) and an arousal rating from 1 to 9 (representing not-at-all arousing to highly arousing). Both the MSIT and the post-task IAPS rating images were presented using Psychophysics Toolbox (http://psychtoolbox.org) running under MATLAB 2013a.

Electroencephalographic data were acquired at 1450 Hz (Nexstim eXimia EEG) from 60 channels placed according to the international 10–20 system and the manufacturer’s standard cap. The ground electrode was placed on the bridge of the nose. One diagonal bipolar electro-oculogram channel was placed around the right eye. Channels were prepared to <5 kΩ impedance. The scalp location of each channel was digitized after cap preparation and prior to recordings. We also digitized the nasion and both pre-auricular points, plus 100 additional scalp points not corresponding to any EEG sensor, to improve the quality of MRI-to-digitization co-registration. In four subjects, in addition to the task data, we collected 1 min each of eyes-open and eyes-closed resting data just after each task block and before the IAPS self-assessment ratings.

All subjects first completed an MSIT block, resting-state collection, and image assessment with their DBS on at its usual clinical settings (DBS ON). Directly after MSIT, but before resting-state and image-rating blocks, subjects also completed 15 min of the Effort Expenditure for Rewards Task (EEfRT)27. A trained clinician then de-activated the bilateral implanted neurostimulators, and the subject rested for at least 1 h without removing the EEG cap. In animal studies, an hour’s withdrawal of chronic stimulation was sufficient to produce robust changes in neural activity that appeared to be a rebound/counter-regulatory response55. This rebound effect does not terminate within an hour, but persists for an extended period, as documented by clinical studies where patients slowly relapse over a week after DBS discontinuation56. The presence of this rebound effect should emphasize or amplify the neurological changes caused by chronic stimulation. After re-preparing any high-impedance channels, subjects again performed MSIT, EEfRT, resting-state, and image ratings (DBS OFF condition) before neurostimulator re-activation. Subjects were aware of their device status, as were the experimenters, although no subject experienced adverse psychological consequences from the study manipulation.

EEG preprocessing

EEG analyses used the minimum norm estimate (MNE)-Python suite57. Offline, EEG data were bandpass filtered between 0.5 and 50 Hz, then epoched. This effectively removes the DBS artifact as shown in our and others’ past work37,58, as all subjects’ stimulators were set above the cutoff frequency. Harmonic frequencies of DBS stimulation would similarly be entirely outside the passband of this filter and outside of all frequency bands analyzed in this work. See Table 1 for individual subjects’ stimulation frequencies. We removed eyeblinks and muscle artifacts with signal space projection59. We then cut trials/epochs from the continuous data. Stimulus-locked analyses used data from 1.5 s before the IAPS onset to 3.4 s after IAPS onset (1500 ms after end of trial). Response-locked analyses used −1.5 s before to 1.5 s after the response. Amplitude rejection (threshold = ± 150 μV) removed trials with residual artifacts. Finally, we converted all trials to change relative to baseline, defined as 0.5 s to 0.1 s before the IAPS onset. For time-domain analyses, we subtracted the mean of this window from all trials for that specific subject; for frequency-domain, we converted data to decibels (dB) relative to baseline.

Of the 14 subjects, six were excluded from further EEG analysis during preprocessing. Four subjects were excluded because their EEG data were recorded without the use of a digitization system. Their data could thus not be accurately source localized. Two more subjects were excluded from further EEG analysis due to substantial electromyographic artifact, which resulted in the rejection of the vast majority of trials following the quality assurance procedures described above. The EEG data of the remaining eight subjects was then subjected to source localization and all further analysis described below.

EEG source localization

We reconstructed subjects’ cortical surfaces from presurgical T1 MRI images using Freesurfer v5.360. The EEG cap digitization was manually co-registered to the Freesurfer anatomical reconstruction using the MNE command line tools package. Then, in MNE-Python, the cortical meshes were downsampled from ~160,000 vertices per hemisphere to 4098 dipole locations (vertices) per hemisphere. We calculated a forward solution using the three-compartment boundary-element model61 with the inner and outer skull surfaces reconstructed from Freesurfer’s watershed algorithm62. The dipole amplitude (current source density) at each cortical location was estimated using the anatomically constrained MNEs method63, using a pipeline similar to other reports of region of interest (ROI)-based oscillatory analyses64. Briefly, the MNE method finds the maximum a posteriori estimates of the latent cortical sources, given the observed sensor sources, assuming (1) the current source amplitudes are sparse and normally distributed with a known source covariance matrix and (2) the observed sensor data contain additive noise with a normal distribution and a known spatial covariance matrix. Importantly, as opposed to other beamforming methods, the MNE method preserves oscillations such that oscillatory power can be estimated following source localization. Each vertex’s current source estimate includes a dipole orientation, such that the source time course may be either positive or negative at any given time. Here, the orientations of the dipoles were constrained to the cortex using recommended default parameters (loose = 0.2, depth = 0.8). The noise covariance matrices necessary for source localization were estimated per subject from a baseline of period of 500 ms prior to the start of each trial. The empirical covariance estimates were regularized via the “shrunk” method, as recommended by Engemann and Gramfort65. Individual source estimate data were then mapped to Freesurfer’s “fsaverage” cortical surface. Finally, source estimate time courses for individual vertices were combined within a set of cortical labels corresponding to our ROIs: cingulate cortex (rACC, dACC, mCC), dorso-mPFC (dmPFC/superior frontal gyrus), dorso-lateral prefrontal cortex (DLPFC/middle frontal gyrus), and ventrolateral prefrontal cortex (VLPFC/inferior frontal gyrus). The average time course per ROI was computed using the “PCA flip” technique in MNE-Python. Briefly, singular value decomposition (SVD) is applied to the vertex-wise time courses per ROI and the first right singular vector is extracted. Each vertex’s time course is then scaled and sign flipped. The scaling is performed in order to match the average power of vertex-wise time courses. The sign of the time course is adjusted by multiplying it with the sign of the left singular vector from the SVD, which ensures that the phase does not change by 180 degrees from one source time course to the next. Supplementary Table 1 lists these labels and the anatomical shorthand used for each in the main text/figures. The anatomical labels/ROIs were manually assembled by merging of multiple smaller, contiguous labels from the Lausanne 243-region atlas66. The labels used here were designed to ensure that each cortical region corresponded to a nearly equal number of vertices in the standard template brain. We selected the label set to cover regions previously implicated in functional neuro-imaging of the MSIT13,25.

Statistical analysis—behavior

The primary behavioral outcome in MSIT is subjects’ RT, as they are pre-trained to very low error rates. Along with others, we have shown that RTs during conflict and decision-making tasks are better approximated by gamma than by Gaussian distributions13,67. We thus analyzed behavior in a mixed effects GLM with the gamma distribution and identity link function. That GLM was applied at the per-trial level, allowing us to model the effects of DBS and trial-specific effects such as emotion and cognitive interference. The mixed effects design, which includes a random intercept for the subject, specifically controls for intra-subject correlation (trials and sessions as repeated measures). We excluded trials with missing responses, error trials, and post-error trials. We further excluded trials with outlier RTs, which we defined by fitting a gamma distribution to each subject’s RT data, pooling the DBS ON and OFF runs for this preprocessing step. We then excluded trials with RT likelihood <0.005 based on the fitted distribution. These approaches excluded 247 trials (6.12% of total, n = 3785 trials retained in analysis).

To control for overall RT variability across subjects, we specified GLMs with a subject-specific random intercept plus fixed effects for experiment variables (mixed models). Similar to prior reports, e.g.28, we identified the appropriate model by minimizing Akaike’s information criterion (AIC) during stepwise addition of variables. Importantly, AIC minimization is mathematically equivalent to constructing the model by out-of-sample cross-validation36, an approach we have identified as essential in biomarker research38. We considered interference, DBS, valence, and arousal as possible RT predictors based on our pre-specified hypotheses and the task design. We also tested interaction terms between these main effects. We considered trial number within a run as a nuisance regressor, controlling for fatigue and/or learning effects. The data were best explained by a model with the aforegoing main effects, but no interaction terms (see main text and Supplementary Fig. 1). Models with other predictors, e.g., RT on the preceding trial (an autoregressive effect), were not identifiable. Conflict and DBS were dummy coded, whereas valence, arousal, and trial number were treated as continuous variables. All independent variables were standardized to the 0–1 interval for regression, but are reported in the article after conversion back to their natural units for ease of interpretation.

Statistical analysis—EEG modulation by task variables and DBS

For the time-domain (evoked potential) analysis, sensor and source space time courses were reduced to the (−0.5, 2.0) s time window for stimulus-locked epochs and (−1.0, 1.0) windows for response-locked epochs. Furthermore, all epochs were low-pass filtered to 15 Hz and downsampled by a factor of 3. Confidence intervals on plotted event-related potentials (ERPs) were calculated by 1000 bootstrap resamplings with replacement (preserving the number of trials within each subject). All ERPs shown are the grand mean across all subjects.

For the spectral-domain analysis, we calculated non-phase-locked power in three bands of interest: theta (4–8 Hz), alpha (8–15 Hz), and beta (15–30 Hz). We emphasized non-phase-locked, or induced, oscillations because they appear to be more directly related to proactive cognitive control17. In trial-based analyses of Simon-effect tasks, over 80% of the conflict/control-related theta power change was non-phase-locked23. The non-phase-locked theta power was correlated with trial-to-trial RTs, more so than the phase-locked theta reflected in the time-domain ERP. Further, in a non-trial-structured cognitive control task, theta oscillations appeared to be continuously present over mid-frontal cortex, increasing in power when more control was needed68. In contrast, phase-locked theta oscillations may be more related to error-related performance monitoring69, a phenomenon not studied here due to the very small number of error trials.

To calculate non-phase-locked power changes, we first subtracted the mean ERP from each trial23. The subtracted ERP (and the trials from which it was subtracted) were calculated for each combination of subject, condition (DBS ON/OFF × Interference/Conflict trials), and ROI/sensor. All plots of EEG power show data after this ERP removal.

Sensor and source-localized data were then decomposed into their time-frequency representation via Morlet wavelet convolution. Wavelets had base frequencies sampled from 2 to 50 Hz in 25 logarithmically spaced steps, where each wavelet was characterized by three cycles. Decomposition was performed on single-trial data, not on the average or ERP. All frequency power estimates were normalized to the average power of a pre-stimulus baseline (−0.5 s to −0.1 s) for each frequency band. We used a dB transform for normalization. The baseline power was computed separately for each subject and DBS condition (OFF, ON). The same pre-stimulus baseline period used for stimulus-locked analyses was also used for response-locked analyses. We then averaged the values within each pre-specified frequency band to obtain a per-trial power time course for each band. All resulting power values shown in the article were normalized to dB as noted above. All power topographic and time course plots represent the grand mean across subjects.

In both sensor and source space, both time-domain and frequency-domain EEG data were analyzed using ordinary least squares regression70,71. The single-trial voltage or power at each time point was entered into a linear model using the same independent variables as the behavioral GLM: interference, DBS, valence, arousal, and trial number. We standardized all independent variables to the [0, 1] interval for this model also. We also considered the possibility that interference and DBS might interact at the neural level even though we saw no behavioral interaction, and thus included a DBS × interference interaction term in this regression. To replicate the effect of the subject-specific intercepts in the behavior model, we subtracted each individual subject’s all-trials mean voltage or power time course from that subject’s trials. Contrast statistics (t-tests) were computed for each resulting beta weight (regression coefficient) at each sample. To control for multiple statistical comparisons (timepoints) within each ROI/electrode, we performed permutation inference and temporal cluster correction72. We used 1000 permutations for each analysis, discarded clusters <50 ms in temporal extent, and retained only clusters that were significant at α = 0.05. For the time-domain analysis in source space, we further corrected these cluster p-values using the Benjamini–Hochberg false discovery rate (FDR) step-down procedure across all tested ROIs. For frequency-domain analysis, we did the same, but using a single step-down across ROIs and frequency bands simultaneously. All significant clusters shown in the article survived these corrections. The exception is that for sensor-space analysis, we did not correct for multiple sensors, because we tested only one sensor for time-domain and one sensor for frequency-domain analysis. The sensor-space frequency-domain p-values were again corrected for multiple bands.

Statistical analysis—EEG/behavioral changes as biomarkers

We hypothesized that both theta band EEG and MSIT behavior changes induced by DBS might correlate with subjects’ clinical response to VCVS DBS treatment. We further hypothesized that this correlation might be with positive clinical response (improvement in depression) or with clinical complications (hypomania, as in28). We quantified these at the individual subject level: MSIT RT as the mean (DBS ON–DBS OFF) difference, and theta EEG as the integrated height of the (DBS ON–DBS OFF) difference wave in the VLPFC (anterior inferior frontal gyrus). The VLPFC label was selected as the predictor variable after viewing the results of the preceding analyses. The difference wave was specifically calculated over the time period where we found a significant cluster during the source space analysis. Depression was measured with the Montgomery–Åsberg Depression Rating Scale (MADRS) as collected during the subjects’ original clinical trials; we did not attempt correlation with OCD symptoms because only two subjects in the sample had OCD. We used the MADRS change from the pre-implant baseline to the day of data collection, or to the nearest clinical visit to the data collection (always within 1 month) if a given subject was unable to complete the MADRS that day. Hypomania used the same dataset as28, in which the presence/absence of hypomanic episodes had been coded for each subject by trained clinical raters. The dependent variable was whether that subject had ever had hypomania during their DBS treatment course. One subject was not included in hypomania analyses due to unavailability of clinical data.

Out-of-sample prediction capability is important to assess for putative psychiatric biomarkers37,38, but difficult to measure in rare populations such as DBS patients. As a surrogate, we generated confidence intervals for the clinical/biomarker correlations by drawing 1000 bootstrap resamples (with replacement) from the original subject population. We used those same bootstrap draws to construct the confidence interval of the area under the curve (AUC) for receiver-operator characteristic (ROC) curves for classification of hypomania present/absent and depression responder/nonresponder. The latter used the same threshold of 50% MADRS improvement as in the clinical trials, e.g. in49.

Statistical analysis—resting-state data

Theta changes observed during MSIT performance might not be specific to the task, but might arise from a general shift in the EEG frequency spectrum during DBS. Five subjects contributed at least 2 min of eyes-open resting-state data with DBS ON and OFF. From these data, we cut 60 1-s artifact-free epochs from the ON and OFF recordings in each subject, then computed a power spectral density (PSD) from 0 to 30 Hz via the multitaper method. We computed mean power within the theta (4–8 Hz) region of each epoch’s PSD, then tested the difference between these distributions with the Mann–Whitney U-test. We carried out these analyses on theta power from sensor Fz, which was the scalp point of highest theta power during MSIT performance.

Validation of MSIT behavioral results in epilepsy controls

A potential concern is that any RT results we observe might be explainable by practice effects. Although the ON and OFF blocks were separated by an hour or more, subjects might still retain some procedural memory of the task. To address this confound, we analyzed data from a group of subjects who performed multiple temporally spaced MSIT runs without the emotional distractors. These subjects were part of a larger study focused on the network-level physiology of mental illness13. They were admitted for inpatient electrophysiologic monitoring of medication-refractory epilepsy. While inpatient, they were approached daily to perform multiple cognitive tasks, including MSIT. In this case, we used the original version of the task, which does not include the background IAPS distractors. Due to the nature of clinical work on an inpatient unit, including breaks for meals and clinical rounds, these subjects often performed one or more 64-trial MSIT blocks with a substantial break in between. This effectively replicates the design of our primary study, except for the DBS manipulation. We analyzed task blocks performed before and after these breaks, in eight subjects. For these subjects, we fit their MSIT trial RTs with a gamma distribution GLM that mimicked the main cohort analysis, i.e., independent/predictor terms for block (which mimics the DBS term), conflict, trial number, and a subject-specific intercept. As with the main cohort, all of these subjects provided full informed consent before any study procedures. All experimental procedures with these subjects complied with governmental and institutional ethics requirements and were approved by the Massachusetts General Hospital Institutional Review Board.