Participants

Nineteen healthy control (CTL) individuals and 18 individuals diagnosed with MDD participated in the study. The data from three CTL and two MDD individuals were excluded due to problems with task presentation in the scanner (MDD N = 1, CTL N = 1) or scanner function (CTL N = 1), discomfort in the scanner (CTL N = 1), or an inability to learn the word pairs (MDD N = 1). After excluding these five participants, we were left with 16 MDD participants (nine females, seven males) and 16 CTL participants (eight females, eight males). The CTL participants had no history of psychiatric disorders and had never taken psychotropic medication. All participants were recruited through online postings, were between 18 and 56 years of age, had no history of brain injury and no substance/alcohol abuse in the last six months, and met the requirements for MRI scanning (e.g., had no metal implants). The MDD participants were not comorbid for bipolar I or II (mania), psychosis, or learning disabilities. The depressed participants also met the DSM-IV criteria for current MDD using the Structured Clinical Interview for DSM (SCID; First, Dibbon, Spitzer, & Williams, 2004). All participants also completed the Beck Depression Inventory (Beck, Rush, Shaw, & Emery, 1979). Participants were compensated for their time, and all gave informed consent. The study was in compliance with the ethical standards set forth by the American Psychiatric Association and was conducted with approval from the Stanford University institutional review board.

Think/no-think task

Materials

The critical stimuli for this study were 24 sets of words; each set included four words. Each set was designed to have two possible cues (e.g., Trunk or Street), both of which were neutral, and two possible response words, one of which was negatively valenced (e.g., Corpse) and the other of which had a neutral valence (e.g., Violin). These words were selected from the Affective Norms for English Words (Bradley & Lang, 1999), allowing us to assess the valence (negative response words, M = 2.1, SD = 0.5; neutral response words, M = 5.5, SD = 0.6; neutral cue words, M = 5.1, SD = 0.6) and arousal (negative response words, M = 5.0, SD = 0.9; neutral response words, M = 4.0, SD = 1.0; neutral cue words, M = 3.4, SD = 0.7) ratings for each set of items.

Each set was designed so the cue words would act as effective retrieval cues for either response, so that the assignments of cues to responses could be counterbalanced across participants (i.e., one participant learned Trunk–Corpse and Street–Violin, and another participant learned Trunk–Violin and Street–Corpse). Similarly, the assignments of words pairs to conditions (baseline, think, and no-think) were also counterbalanced across participants. This meant that there were a total of six counterbalancing conditions for the items (three conditions and two cue-to-response mappings). Importantly, the cues were always neutral; therefore, the cue itself (Trunk or Street) did not provide any information about the valence of the response word. The independent probes were also designed to uniquely cue each response word separately (e.g., Anatomy–Co____ for Corpse; Lessons–Vi___ for Violin). Independent probes are used in these type of paradigms to rule out several noninhibitory explanations of forgetting (for more information, see Anderson & Spellman, 1995). These sets were divided into three groups of eight items that rotated through the experimental conditions (think, no-think, and baseline). An additional six word pairs (all neutral items) were used as fillers throughout the experiment; thus, each participant learned a total of 54 word pairs (six filler, eight think–negative, eight think–neutral, eight baseline-negative, eight baseline-neutral, eight no-think–negative, and eight no-think–neutral).

Procedure

The TNT procedure consisted of three separate phases: learning, TNT, and test. The learning phase was completed outside the scanner; the TNT phase and test phase were conducted while participants were inside the bore of the MRI scanner, though fMRI data were collected only during the TNT phase. This procedure was used to minimize forgetting on the final test that might be due to changes in physical and mental context associated with getting out of the scanner.

Learning phase

Participants learned the cue–associate word pairs through a drop-off study–test training procedure. On study trials, participants were presented with an intact word pair for 5 s and were encouraged to form an association between the items. On test trials, the cue word appeared and participants had up to 5 s to verbally report the associate. For feedback, the correct associate was presented for 2 s after every test trial. An experimenter recorded whether or not the response was correct. If no response was given or the response was incorrect, that cue word was presented again at the end of the list (i.e., items recalled correctly dropped out of the set). This was repeated for each list until every correct associate was provided once. To make this learning phase easier for participants, we divided the large number of word pairs into smaller lists and tested each word pair three times across the whole learning phase. More specifically, participants initially learned lists of six word pairs at a time (i.e., they studied six pairs, and then were immediately given drop-off testing on those six items). After the participants had learned three lists of six pairs, they were given an 18-item drop-off test covering all three lists they had just learned. Once this was complete, they moved on to three more lists of six items, followed by an 18-item drop-off test reviewing all of those pairs. After a third list of 18 items was learned, participants were given a drop-off cycle on all 54 word pairs they had learned. Once completed, participants were given one final test for all the cue–associate pairs to confirm which word pairs had actually been learned. During this last test, participants were not given feedback after making their response, and items they missed did not appear again at the end of the cycle. To minimize any potential differences in learning between the conditions (e.g., between negative and neutral items or between the MDD and CTL groups), all subsequent analyses were restricted to cue–associate pairs that were correctly reported on this final learning test. This allowed us to be sure that any differences we observed were not due to differences in initial learning.

TNT phase

For each trial, participants saw a cue from one of the word pairs (e.g., Street) and were asked to exert control over the retrieval process. For think trials, they were asked to recall the associated word (e.g., Corpse). For no-think trials, their task was to prevent the associated word from entering consciousness. Participants were not given any specific suggestions about strategies they could use to accomplish this task. Each retrieval cue was presented for 3 s, and they were asked to follow the task instructions for the entire time the cue was presented. Participants were cued to perform either of these tasks by the color of the cue word: Think cues appeared in green, and no-think cues were red. Participants completed a practice block that was 20 trials long and that included only filler pairs, to get the participants used to the procedure before scanning began. After this practice phase, they were asked about their approach to the task and given directed feedback if they were not performing the task as instructed (e.g., if they averted their gaze from the retrieval cue or covertly rehearsing the responses for no-think trials). The actual TNT phase consisted of six runs of 64 trials each (384 trials total); each run lasting 5 min 40 s. Each cue was repeated twice during every block (eight think and eight no-think cues of both valences, each presented twice). The trial order was determined by Optseq (http://surfer.nmr.mgh.harvard.edu/optseq; Dale, 1999), which pseudorandomly mixed the four conditions (think–negative, think–neutral, no-think–negative, and no-think–neutral) and used variable intertrial intervals (0.5–12 s). During the intertrial intervals a fixation cross appeared in the center of the screen, and participants were instructed to look at the cross and wait for the next trial to begin.

Test phase

After the end of MRI scanning, memory was tested for all word pairs. Participants were administered a brief practice test that tested only filler word pairs, to make sure that they understood the task. Then they were given two final memory tests, the same-probe (SP) and independent-probe (IP) tests, with the order of these two tests counterbalanced across participants. For each test trial, a retrieval cue was presented for 4 s, and participants were asked to verbally provide the associated word. The retrieval cue for the SP test was the cue from the originally studied word pair, and for the IP test it was a semantically related but unstudied cue along with a two-letter stem (Fig. 1A).

Fig. 1 Behavioral procedure (A) and results (B). (A) During the study phase, participants learned word pairs until they could provide the associated member of each pair when shown the cue word as a retrieval cue. Then, during the think/no-think (TNT) phase, participants were scanned while they tried to exert control over memory retrieval. For think trials (in green during the trial), participants were asked to think of the associated word. For no-think trials (in red), they were asked to prevent the related word from entering awareness. Baseline items were not presented during this phase. After scanning, participants were asked to recall all of the studied response words, from both the originally studied retrieval cue (the same-probe test) and from a novel, extralist associate (the independent-probe test). (B) The critical outcome measure in the TNT task was the suppression score, which reflected whether or not avoided memories were recalled more poorly than baseline items (Baseline recall – No-think recall). Shown here are the suppression scores for both groups of participants (MDD and CTL), as a function of the valence of the to-be-suppressed memory (neutral or negative) and the type of final memory test (SP or IP). Overall, participants tended to forget the no-think items, and these suppression scores did not vary significantly by group, valence, or test type. The full set of means for recall in the final test phase are reported in Table 4. Error bars indicate standard errors of the means (SEMs). Full size image

Behavioral measures

The data from the test phase were analyzed to assess the behavioral consequences of attempting to control conscious awareness of a memory. A 2 × 2 × 2 × 3 mixed analysis of variance (ANOVA) was utilized, with the between-subjects factor Group (MDD and CTL) and the within-subjects factors Test Type (SP, IP), Valence (negative, neutral), and Condition (think, baseline, no-think). To focus more directly on the key behavioral measure (i.e., the magnitude of SIF), suppression scores were calculated by subtracting the recall of no-think items from the recall of baseline items, within a given valence and within each test type. This measure provides an index of how successful participants were at forgetting the avoided associates, controlling for general forgetting that would be expected on a delayed memory test. This measure treated SIF as a positive value, so participants who forgot more of the no-think items would show larger suppression scores.

MRI data acquisition

Whole-brain imaging data was acquired via a 3.0-T General Electric Signa MR scanner (Milwaukee, Wisconsin) at the Richard M. Lucas Center for Imaging at Stanford University School of Medicine. After a scout scan used for slice prescription, high-order shimming was performed for whole-brain distortion estimation until diminished returns were produced. Blood-oxygenation-level-dependent (BOLD) functional data were acquired using an eight-channel, whole-head coil from 31 axial slices with a spiral in–out pulse sequence (Glover & Law, 2001; TR = 2,000 ms, TE 30 ms, flip angle = 80°, FOV = 22 cm, number of frames = 170, in-plane resolution = 3.44 mm2, through-plane resolution = 4 mm). To anatomically localize the functional activations, a high-resolution structural scan (spoiled gradient echo: 156 slices, in-plane resolution = 0.86 × 0.86 mm, through-plane resolution = 1 mm, TE = 3.4 ms, flip angle = 15°, FOV = 22 cm) was collected after the BOLD scanning runs.

FMRI data processing and analysis

Data processing and analysis was conducted using the Analysis of Functional Neuroimages (AFNI) software suite (National Institutes of Health; http://afni.nimh.nih.gov/; Cox, 1996) and MATLAB (The MathWorks Inc., Natick, MA). The BOLD images were slice-time-corrected, followed by motion correction with a Fourier interpolation algorithm. Data were not corrected further if sudden movements were less than 1 mm. A despiking algorithm was used to correct for movements between 1 and 3 mm by replacing motion-influenced acquisitions with outlier insensitive estimates. Specifically, a given TR was defined as an outlier if its BOLD value was greater than a standard deviation threshold. Outlier values were then replaced with values from a polynomial fit across all TRs, excluding outliers. Spatial smoothing was conducted with a Gaussian kernel (full width at half maximum = 4 mm). The data were high-pass filtered at 1 cycle/min and converted to percent signal change. Finally, individual participant maps were converted to the Talairach common template space (Talairach & Tournoux, 1998), which allowed for between-group comparisons.

Processed time series data were then submitted to a general linear model (Friston, Holmes, Worsley, & Poline, 1995) that included regressors for condition (think and no-think) and valence (negative and neutral), residual motion, and first-, second-, and third-order polynomial trends. The regressors of interest were convolved with a gamma-variate function that modeled a canonical hemodynamic response before inclusion in the model (Cohen, 1997), and betas were estimated.

To assess consistency with previous fMRI investigations of the TNT procedure (e.g., Anderson, 2004), whole-brain maps were first computed using paired t tests on a voxel-wise basis, to create contrasts between think and no-think trials. This was done separately for each group (CTL and MDD) and both valences (neutral and negative). This allowed us to assess the extent to which the basic pattern of suppression-related activations would be observed in each of our contrasts (e.g., the MDD group suppressing negative items).

Next, we assessed whether activations differed between the CTL and MDD groups and between valences. To do this, we computed voxel-wise mixed ANOVAs with the between-subjects factor Group (MDD, CTL) and the within-subjects factors Condition (think, no-think) and Valence (negative, neutral). We then tested the significance of the two-way interactions between group and condition (within neutral trials) to identify any regions that were differentially active in the MDD versus CTL groups during no-think trials. Finally, to also consider valence, we looked for any regions that showed a three-way interaction of group, condition, and valence. This allowed us to identify regions that depressed and nondepressed participants might recruit differentially during the suppression of negative material.

All of these analyses were conducted at the whole-brain level; we had strong a priori interest, however, in the hippocampus and the amygdala. Thus, for these regions, we also conducted analyses with a small-volume correction (SVC), defined by probabilistic cytoarchitecture maps derived from postmortem brains (Eickhoff et al., 2005). For a given region, a voxel was included if at least 50 % of postmortem brains indicated that the voxel was identified as that region. Given that these were neighboring regions, a single search space was created for each hemisphere by combining hippocampal regions (cornu ammonis, entorhinal cortex, dentate gyrus, and subiculular complex) and amygdalar regions (centromedial, laterobasal, and superficial groups). The resulting bilateral hippocampus/amygdala volume was used for SVC, for a total volume of 23,960 mm3 (2,995 voxels). In the Results section, we report which clusters were identified in the whole-brain analysis, and which only survived using the SVC.

To control for multiple hypothesis testing while identifying significant outcomes, cluster-wise correction was implemented using 10,000 Monte Carlo simulations (Xiong, Gao, Lancaster, & Fox, 1995) using AFNI’s AlphaSim program. For whole-brain analyses, the uncorrected voxel significance threshold was set to p = .005, requiring a cluster of 256 mm3 (k = 32 voxels) to reach a corrected significance level of p < .05. To reach a corrected p < .05 significance level in the SVC analysis of the combined hippocampus/amygdala, the voxel-wise threshold was set to p < .05, and the required cluster sizes were 352 mm3 (k = 44 voxels) for left and 368 mm3 (k = 46 voxels) for right hippocampus/amygdala. Thus, both whole-brain and SVC analyses maintained a family-wise Type I error rate at p < .05. A more conservative voxel-wise p-value threshold was adopted for the whole-brain analysis, to reduce false positives, improve localization, and facilitate interpretation (Chrastil, Sherrill, Hasselmo, & Stern, 2015; Woo, Krishnan, & Wager, 2014).

To further explore significant the two- and three-way interactions involving group, time courses were extracted from significant clusters, and summary average signal estimates were computed across the second, third, and fourth time points (TRs 2–4, encompassing the expected activation peak), consistent with prior studies of the TNT paradigm (e.g., Levy & Anderson, 2012).

Correlational analyses

Prior studies had found that behavioral measures of forgetting correlated with brain activity in the prefrontal cortex and within the medial temporal lobes (e.g., Anderson, 2004; Depue et al., 2007; Levy & Anderson, 2012). Similarly, studies had also reported correlations between the activity in different brain regions (e.g., Anderson, 2004; Benoit & Anderson, 2012; Depue et al., 2007), providing insights into how regions might interact during this task. Therefore, we attempted to replicate these analyses, but the analyses were inconclusive, so we report them only in the supplement (see the Supplemental Results). We also performed a series of exploratory analyses to explore whether activity in our regions of interest correlated with various clinical and psychological characteristics (see the Supplemental Method). Given the exploratory nature of these analyses, we also report the results of these analyses in the supplement (see the Supplemental Results).