For experiment 1, 86 subjects (mean age 19.11 years; SD = 1.17 years; 32 males) were recruited using the University of Colorado undergraduate research pool and successfully completed the Double Go and Stop Tasks. 2 subjects failed eyetracker calibration and were excluded from pupillometric analyses. For experiment 2, 45 subjects (mean age 19.86 years; SD = 2.21; 23 males) were recruited using the University of Colorado undergraduate research pool and successfully completed the Double Go and Stop Tasks. 7 of these subjects were excluded from ERP analysis for artifacts caused by excessive blinking (>60% of trials). For experiment 3, 19 subjects (mean age 23.3: SD = 4.4; 10 males) were recruited from the local community and successfully completed the Double Go and Stop Tasks. One subject was excluded from fMRI analyses due to motion artifact.

In all respects the Double Go and Stop Task were identical within any given experiment (e.g., the precise interstimulus and intersignal intervals, the presence of “null” trials, etc), with the following exception: subjects are naturally aware of when they fail to successfully stop a response, but seem unaware of their relative speed on trials with the infrequent stimulus. To avoid any possible mismatch across the two tasks owing to this difference, we provided explicit feedback on all signal trials. Specifically, in the Double Go Task, the signal turned red if subjects were slower than their average running RT (experiments 2 & 3); in experiment 1 this was presented as sham feedback. (Double Go task trials with categorically incorrect responses – such as a failure to respond twice on Signal trials, or anything but a single correct response on No Signal trials – were extremely rare and excluded from all analysis). Similarly, in the Stop Task, the signal turned red if subjects failed to successfully stop their response on that trial (in all experiments). Additional cross-experiment differences in our tasks suggest the generality of our results across minor variations in experimental procedure (see Figure S1 & Table S1 ).

All subjects in all experiments completed the Double Go Task prior to completing the Stop Task. This fixed task order was adopted for reasons described in Text S1 – in particular, the use of a fixed task order is ideal for the investigation of individual differences [3] , which was a central goal of the study reported here. Nonetheless, appropriate precautions were taken to prevent the contamination of experimental effects with cognitive phenomena that might arise from the fixed task order (e.g., the use of within-task baselines are used for all pupillometry, ERPs, and fMRI analyses, so as to control for the relatively general effects of phenomena like fatigue).

Statistical Analysis of fMRI

Data were acquired with a 3T GE Signal whole-body MRI scanner at the University of Colorado Health Sciences Center, using T2-weighted echo-planar imaging (EPI) (TR = 2000 ms, TE = 32 ms, flip angle = 70°). Additional acquisition details are available in Text S1.

Image pre-processing and analyses were conducted with FSL (FMRIB's Software Library). The first six volumes of each run were discarded to allow the MR signal to reach steady state, the remaining images in each participant's time series were motion corrected using MCFLIRT, and non-brain voxels were removed using a brain extraction algorithm (BET). The data series was spatially smoothed with a 3D Gaussian kernel (FWHM = 5 mm), intensity normalized for all volumes, and high-pass filtered (s = 50 sec).

After statistical analysis of each time series (details of the regression model are available in Text S1), statistical maps were normalized into the MNI-152 stereotaxic space using FLIRT (FMRIB's Linear Image Registration Tool). Parameter estimates (PE) were transformed into a common stereotaxic space using the above-mentioned three-step registration prior to the group analyses with FLAME (FMRIB's Local Analysis of Mixed Effects). Z-statistic images were thresholded using clusters with z >2.58 as well as a whole-brain corrected cluster significance threshold of p<.05 using the theory of Gaussian Random Fields.

ROIs for Brodmann areas were anatomically defined using the Talairach labeled atlas (see Figure S2), and mean percent signal change was extracted using FSL's featquery tool. The subthalamic nucleus was anatomically defined using a 10 mm3 region centered on the MNI coordinates previously used in the Stop Task to interrogate BOLD in the STN (10,−15,−5) [41]. The TPJ was anatomically defined using a 30 mm3 region centered on the MNI coordinates (−54, −52, 30) previously observed in a target detection task [42].

Pattern classification analyses were conducted on the beta-weights resulting from the above fMRI analysis pipeline, with four minor exceptions. First, the BOLD data were not spatially smoothed; second, the PEs were not statistically thresholded; third; the PEs were z-transformed across all voxels within a given ROI for each subject, to ensure that the classifiers were forced to operate on the basis of distributed patterns of activation instead of overall magnitudes. Finally, voxels with z-values falling outside of +/− 4.5 were windsorized. Classifiers were implemented as neural networks in Emergent [43]; separate networks were then trained, using Hebbian and Contrastive Hebbian learning, for each ROI (and therefore differed in terms of the number of input units), and for identifying which individuals generated the data vs. what trial type the data was estimated from (and therefore differed in terms of the number of output units) but all other aspects of the network architecture were the same. See Text S1 for full details on classifier implementation.

Statistical Analysis of ERPs. During the Double Go and Stop Tasks scalp voltages were recorded with a 128-channel geodesic sensor net [44]. Amplified analog voltages (0.1- to 100.0-Hz bandpass) were digitized at 250 Hz. Individual sensors were adjusted until impedances were less than 50 k. The EEG was digitally low-pass filtered at 40 Hz. Trials were discarded from analyses if they contained incorrect responses, eye movements (eye channel amplitudes over 70 V), or more than 20% of channels were bad (average amplitude over 100 V or transit amplitude over 50 V). Individual bad channels were replaced on a trial-by-trial basis with a spherical spline algorithm. EEG was measured with respect to a vertex reference (Cz), but an average-reference transformation was used to minimize the effects of reference-site activity and accurately estimate the scalp topography of the measured electrical fields. The average reference was corrected for the polar average reference effect [45]. ERPs were obtained by stimulus-locked averaging of the EEG recorded in each condition. ERPs were baseline-corrected with respect to a 200-ms prestimulus recording interval. These baselines were calculated separately for each task, thereby controlling for nonspecific effects like fatigue.

Where montages are used, the occipital montage was centered on Oz (including Oz, O1, O2, and the contiguous set of electrodes 76, 70, 74 and 82) and the frontal montage was centered on Fz (including Fz and the contiguous set of electrodes 4, 5, 10, 12, 16 18 and 19). For scalp-wide voltage correlations (Fig. 4B) we calculated Pearson's R across tasks at every time point as the variance shared between the subjects x electrode matrix across tasks. Thus, this correlation reflects changes in voltage that covary across tasks in the same subjects at the same electrode sites. For montage-based voltage correlations (main text Fig. 4C) we calculated Pearson correlations separately for the frontal and occipital montages both before and after signal onset.

Statistical Analysis of Pupillometry. Pupil diameter was recorded continuously during the Double Go and Stop Tasks via a Tobii X50 infrared eyetracker calibrated to each subject. Sampling at 50 Hz was synchronized to fixation onset, and pupil diameter was calculated as the average diameter of successfully-tracked eyes for each sample. Baseline measurements of pupil diameter were calculated as the average diameter during the 200 ms preceding the onset of each signal (or the corresponding time period for no signal trials); this value was subtracted from the averaged samples recorded following the onset of the signal (or the average signal onset for no signal trials). Baseline periods were calculated independently for the Stop and Double Go tasks, providing a within-task baseline to control for nonspecific cognitive effects like fatigue. These normalized, averaged pupil diameter samples were then smoothed using a box-car filter with width of 60 ms.

Statistical Analysis of Behavior – Double Go Task. In the Double Go Task, all RTs falling below 150 ms or above 750 ms were excluded from analysis, as well as those on No Signal trials falling outside of 3.5 standard deviations of the iteratively-calculated mean for each subject. RTs were only analyzed on correct trials (i.e., trials in which two responses of the correct type were provided on Signal trials, and where one and only one response of the correct type was provided on No Signal trials).

Individual differences were extracted from the Double Go task using a mixture model-based adaptation of the classic race model of the Stop task (see also Text S1). Specifically, to classify individual trials as slowed or unslowed, we first decomposed the distribution of equipercentile residuals into two underlying distributions: a Gaussian distribution with a mean of zero (corresponding to unslowed first RTs), and a Gamma distribution (corresponding to the slowed first RTs). The two free parameters to the Gamma and the one free parameter to the Gaussian were fit in a fixed-effects analysis using maximum likelihood estimation via with the Nelder-Meade simplex algorithm [46]–[47]. The maximum likelihood fit is illustrated as overlaid lines on the residual histogram (Fig. 6C), which was relatively stable across multiple optimizations with different starting parameters and yielded a better overall fit (see Table S1) than a single Gaussian in terms of the Bayesian Information Criterion (BIC), calculated as: Where N is the total number of observations, D is the total number of distributions fit, D p is the total number of free parameters used in fitting those distributions, d is the weight of the dth distribution, and L d (RT n ) is the likelihood of the nth RT given the best fit parameters for the dth distribution (μ and σ for Gaussian and k and Θ for Gamma).

We next categorized individual trials as slowed or unslowed using the likelihood of observing each RT under either of the two fitted distributions. RTs were categorized as slowed if there was even weak evidence in favor of the RT belonging to that distribution (as quantified by a difference in BIC of ≥2.35); otherwise RTs were categorized as unslowed. Other standards of evidence lead to similar results as those presented here, but do not as cleanly separate the slowed and unslowed trials (c.f. Fig. 6D).

To calculate TOSD, we subtracted the signal delay from the nth percentile of no signal trial RTs, where n corresponds to the proportion of RTs classified as unslowed at that signal delay. This approach is conceptually identical to that used to calculate SSRT in the race model, in which the signal delay is subtracted from the nth percentile of No Signal RTs, where n corresponds to the proportion of unsuccessful stop trials at that signal delay. TOSD was calculated for each subject as the median of these estimates across all signal delays. This estimate was unreasonably high for subjects for whom no RTs had been classified as slowed (n = 34 out of 150), so in those cases we used the minimum estimate of TOSD across all signal delays.

We then calculated the duration of slowing as the average difference between RTs classified as slowed and RTs of corresponding percent rank in the no signal RT distribution; subjects for whom no RTs had been classified as slowed were excluded from all analyses involving duration of slowing. The resulting estimates of TOSD and duration of slowing can be found in Table S2.