Participants

Twenty-one young healthy, male, right-handed college students (mean age±s.d., 21.6±0.81 years ranged from 20 to 24) with normal or corrected-to-normal vision participated in this study. Participants followed regular daily work-rest schedule, and reported no sleep-related problems during the last two months49. All participants reported good quality sleep during the night after training on day 1, with 7–9 h sleep time (mean±s.d.=7.35±0.37). Informed written consent was obtained from all participants before the experiment, and the study protocol was approved by the Institutional Review Board for Human Subjects at Beijing Normal University. Data from three participants were excluded from fMRI analyses due to excessive head movement during scanning. Note that another independent cohort of 25 participants (age range from 20 to 24 years old) was recruited for a parallel study using the same paradigm to examine the reproducibility of our behavioural and SCR findings. An additional behavioural control experiment with neutral stimuli was conducted in another independent 30 participants (age range from 20 to 24) to examine whether the observed effects of overnight consolidation on aversive memories were generalizable to neutral memories.

General procedure

The whole experiment consisted of three phases: memory acquisition, memory suppression, and memory testing (Fig. 1). The memory acquisition phase comprised two sessions on day 1 and day 2, which occurred about 24 h (that is, training on day 1) and 30 min (that is, training on day 2) before fMRI scanning. On day 1, participants were extensively trained to learn and remember a set of 26 face-aversive picture associations. On day 2, they returned and were trained to remember another set of 26 face-picture associations. About 30 minutes after the training on day 2, participants underwent the scanning during which they were instructed to perform the memory suppression task using ‘Think/NoThink’ paradigm4 in conjunction with concurrent recording of SCRs. Following that, participants performed a post-scan memory test for face-picture associations to assess their subsequent memory performance and effectiveness of memory suppression.

Stimuli

Fifty-two face-aversive picture pairs were used in the present study. Fifty-two faces (26 males and 26 females) were carefully selected from 100 colour photographs of Chinese individuals unknown to the participants50. To standardize the stimuli and minimize potential confounding factors, faces were selected under the following criteria suggested by previous studies51,52: direct gaze contact, no strong emotional facial expression (which was rated as having a neutral expression in a pilot study on separate 9-point scales: 1=‘extremely sad’ or ‘not arousing at all’, 9=‘extremely happy’ or ‘extremely arousing’, with mean valence=5.16±0.53 and mean arousal=5.03±0.43), no headdress, no glasses, no beard, etc. There was no significant difference in terms of arousal, valence, attractiveness, and trustworthiness between male and female faces (all P>0.05). Subsequent analysis also revealed no difference in memory accuracy between male and female faces (t(17)=−0.88, P=0.39). Fifty-two aversive pictures from the International Affective Picture Series53 were carefully selected as having minimal relatedness in content with each other as possible, with a highly negative level of emotional valence (mean valence=2.37±0.69) and a high level of negative arousal (mean arousal=7.89±0.55) as measured on 9-point scales. Faces and aversive pictures were randomly paired across participants to create 52 face-picture associations. Associations were randomly assigned to day 1 and day 2. For behavioural control experiment (testing for neutral memories), 52 neutral pictures were chosen from International Affective Picture Series and online resources. They were carefully matched on complexity, luminance and contrasts with the aversive pictures, and have modest level of emotional valence (M=5.32±0.47) and low level of arousal (M=2.52±0.36) based on ratings from an independent sample.

Memory acquisition

During the acquisition phase, participants performed a training session on day 1 and another on day 2 outside the scanner, which occurred 24 h and 30 min before the Think/NoThink task respectively. We carefully controlled the time interval between 30 min and 24 h on day 1 and day 2 to minimize variability across participants54. We thus restricted the first training phase to start at 16:00 hours on day 1, the scanning during ‘Think/NoThink’ phase to 16:00–17:00 hours on day 2. The day 2 training started a half hour before scanning. In each training session, participants were trained to memorize 26 face-aversive picture pairs using multiple study–recall cycles. In each study–recall cycle, each association was presented for 4 s, and participants were encouraged to remember the association in detail. After presentation of all associations, participants were then shown a face and asked to recall details of the corresponding associated picture. This study–recall cycle was repeated 3–5 times until each participant could recall correctly all 26 face-picture associations. For each association, we required the participants to give detailed descriptions about the associated images when faces were presented as cues. The participants were asked to give enough details of that image to enable that it to be uniquely identified. This procedure was used to insure that participants formed vivid episodic memories for all associations rather than vague impressions based on familiarity4,55,56. Participants who required more than five training cycles to meet the criterion of 100% accuracy were excluded from the experiment. Note that we restricted the training phase within 3–5 cycles to avoid extensive training which may result in memories too strong to suppression (3,5).

TNT phase

In the TNT Phase, participants underwent fMRI with concurrent recording of SCRs while performing the TNT task for face-picture associations acquired on day 1 and day 2. Each trial started with presentation of a face for 4 s, and was followed an inter-trial interval with a fixation cross for 2–6 s (average duration=4 s). The ‘Think’ and ‘NoThink’ trials were pseudo-randomized across participants and interleaved by a fixation period. The instructional cues (i.e., green and red rectangles) indicated ‘Think’ and ‘NoThink’ trials respectively, which appeared simultaneously with the face presentation. The presentation of only faces in this phase ensured that participants manipulated associated memories of the target picture4,57. When seeing an instructional cue ‘Think’ (green rectangle), participants were required to recall and think of the previously learned picture, and when seeing an instructional cue ‘NoThink’ (red rectangle), they were instructed to not let the associated picture enter consciousness. After the presentation of a face, a fixation was presented during an inter-trial interval which served as a low level baseline for the experimental trials4. The total duration of the task was 19.2 min with 144 trials in total with 36 trials in each condition. Participants were shown 36 out of 52 faces in total, with half of them acquired on day 1 around 24 h before the task (that is, overnight condition) and the other half acquired on day 2 about 30 min before the task (that is, newly acquired condition). Cues for baseline pairs were not presented in this phase. Half of these faces were randomly assigned to either Think (that is, ‘T) or NoThink (that is, ‘NT’) condition, resulting in four experimental conditions in a 2-by-2 full factorial design (that is, memory suppression: NoThink versus Think, acquisition time: 30 min versus 24 h) with 9 faces per condition. Each face repeated four times, resulted in 144 trials in total. The remaining 16 faces (half of them acquired 24 h before the task) were not included in the TNT task, served as a behavioural baseline. Before the fMRI scanning, participants were trained twice using 10 trials that were not used in the actual experiment. Participants were explicitly instructed to directly suppress unwanted aversive memories by attempting to exclude a memory from awareness, rather than occupying awareness with another competing thought (i.e., thought substitution)13.

Post-scan memory test

In the testing phase (Fig. 1d), memory performance for face-picture associations was assessed by a cued-recall task, in which all 52 faces (learned on day 1 and day 2) were included. Each trial started with a face as a cue, and participants were encouraged to recall details of the associated picture. On each trial, participants were given a maximum of 30 s to verbally describe the associated pictures in as much detail as possible. The description was only scored as correct (that is, remembered) if it included enough details for the specific scene to be uniquely identified57. Incorrect or vague descriptions were treated as forgotten. Three raters who were blinded to the experiment reviewed participants’ answers independently. The final test scores were cross-validated by three raters. A final judgment on each item was only made when a consensus was reached among the three raters. If there was any disagreement, the three raters discussed them and made a collective decision. In the TNT phase, the baseline items were not present in TNT task and thus with less presentation times. This has been widely used to provide a baseline measure of memory performance that is not directly affected by ‘Think’ or ‘NoThink’ manipulation3,4,11.

Behavioural data analysis

Memory accuracy was submitted to a 2-by-3 repeated-measures ANOVA with memory suppression (Memory: Baseline versus versus Think versus NoThink) and acquisition time (Time: 30 min versus 24 h) as within-subject factors. Similar to Levy and Anderson18, we also computed the suppression score for overnight (24 h) and newly acquired (30-min) memories by subtracting corresponding memory accuracy of ‘NoThink’ items from their respective ‘Baseline’ items to provide a measure of suppression efficiency. Individual participant’s suppression score was then Z-normalized for further brain-behavioural prediction analyses.

SCR recording

SCR was recorded simultaneously with fMRI scanning using a Biopac MP 150 System (Biopac, Inc., Goleta, CA). Two Ag/AgCl electrodes filled with isotonic electrolyte medium were attached to the center phalanges of the index and middle fingers of the left hand. The gain set to 5, the low pass filter set to 1.0 Hz, and the high pass filters set to DC58. Data were acquired at 200 samples per second. Before analysis, the data were transformed into microsiemens (μS) and square root transformed due to non-normality of the data distribution.

SCR analysis

SCR data were analysed offline using Matlab R2014a (MathWorks, Natick, USA). First, data were temporally smoothed with a median filter (that is, 40 samples within a 200 ms window) to reduce scanner-induced noise59,60. We used Autonomate61 to analyse event-related SCRs, which has been proven effective in the context of event-related cognitive tasks2,61. In brief, the electrodermal data were segmented into event-related time windows based on face onset. The face-related SCR was located by identifying rises in the electrodermal data, which constituted the onset of an SCR. Responses that did not fit these criteria were scored as zero. Noisy segments of data (in which an implausible number of candidate SCRs were present) were excluded from further analysis (see more details in the Supplementary Methods). The resulting number of trials used for SCR analyses are reported in the Supplementary Table 6. SCR values were then identified as the maximum value within each time window. If multiple SCRs fell within the same window, the largest response was scored61. A 2-by-2 repeated-measures ANOVA with memory suppression (Think versus NoThink) and acquisition time (30 min versus 24 h) as within-subject factors was used to examine physiological changes as a function of 4 conditions for cognitive manipulations on aversive memories.

Imaging acquisition

Whole-brain imaging data was collected on a Siemens TRIO 3-Tesla MR scanner in the National Key Laboratory of Cognitive Neuroscience and Learning at Beijing Normal University. Functional images were collected using an echo-planar imaging sequence (axial slices, 33; slice thickness, 4 mm; gap, 0.6 mm; TR, 2000, ms; TE, 30 ms; flip angle, 90°; voxel size, 3.1 × 3.1 × 4.0 mm; flip angle, 90°; FOV, 200 × 200 mm; and 580 volumes), while structural images were acquired through three-dimensional sagittal T1-weighted magnetization-prepared rapid gradient echo (192 slices; TR, 2530, ms; TE, 3.45 ms; slice thickness, 1 mm; voxel size, 1.0 × 1.0 × 1.0 mm3; flip angle, 7°; inversion time, 1100, ms; FOV, 256 × 256 mm).

Imaging preprocessing

Brain imaging data was preprocessed using Statistical Parametric Mapping (SPM8; http://www.fil.ion.ucl.ac.uk/spm). The first 4 volumes of functional images were discarded for signal equilibrium and participants’ adaptation to scanning noise. Remaining images were corrected for slice acquisition timing and realigned for head motion correction. Subsequently, functional images were co-registered to each participant’s gray matter image segmented from corresponding high-resolution T1-weighted image, then spatially normalized into a common stereotactic Montreal Neurological Institute (MNI) space and resampled into 3-mm isotropic voxels. Finally, images were smoothed by an isotropic three-dimensional Gaussian kernel with 4 mm full-width at half-maximum. The data were statistically analysed under the framework of general linear models (GLM)62.

Univariate GLM analysis

To assess transient neural activity associated with memory retrieval (that is, Think trials) and suppression (that is, NoThink trials) for newly acquired and overnight aversive memories, separate regressors of interest were modelled for four experimental conditions (see above) and convolved with the canonical hemodynamic response function (HRF) at the first level. In addition, each participant’s motion parameters from the realignment procedure were included to regress out effects related to head movement-related variability. The analyses included high-pass filtering using a cutoff of 1/40 hz to remove high frequency noise4, global intensity normalization and corrections for serial correlations using a first-order autoregressive model (AR(1)). Relevant contrast parameter estimate images were initially generated at the individual-subject level, and then submitted to a 2 (Suppression) by 2 (Time) repeated-measures ANOVA for a second-level group analysis treating participants as a random variable. Significant clusters were identified from the group analysis, initially masked using a gray matter mask, and then determined using conservative and well-accepted statistical criteria—that is, a height threshold of P<0.01 and an extent threshold of P<0.05 with family-wise error corrections for multiple comparisons based on nonstationary suprathreshold cluster-size distributions computed using Monte Carlo simulations63.

To further investigate specific neural activity associated with suppression-induced voluntary or intentional forgetting (that is, NTf) and incidental forgetting (that is, Tf), we conducted an additional GLM analysis by including memory Status (forgotten versus remembered, or f versus r) as another variable of interest, similar to Anderson et al.11, together with Time (30 min versus 24 h) and Suppression (NoThink versus Think, or NT versus T). This analysis included eight regressors of interest (that is, NTf_30 min, NTr_30 min, Tf_30 min, Tr_30 min, NTf_24 h, NTr_24 h, Tf_24 h, and Tr_24 h). We used random sub-samplings from the 30 min conditions to artificially match the number of items in the 30 min and 24 h conditions being used for the first level individual analysis. This allowed us to compare neural activity in successful suppression trials between newly acquired and overnight memories with similar statistical power (that is, [NTf_24 h—Tf_24 h] versus [NTf_30 min—Tf_30 min]). Four participants were excluded from this analysis due to lack of at least one forgotten item in each condition. Relevant parameter contrasts were then submitted to 2 (Status: forgotten versus remembered) by 2 Time (Time) by 2 (Suppression) repeated-measures ANOVA for a second-level group analysis treating participants as a random variable. All other settings were same as the above GLM analysis.

To better characterize hippocampal and prefrontal engagement in memory suppression, we performed complementary ROI analyses separately for the left and right entire hippocampus and the middle frontal gyrus (referred to as DLPFC) anatomically defined using the WFU PickAtlas toolbox64. Parameter estimates (or β-weights) associated with conditions of interest were extracted from the above anatomically defined ROIs as well as significant clusters in the MTL and PFC regions at the individual level using MarsBar (http://marsbar.sourceforge.net/) and averaged across voxels within each ROI, then plotted in bar graphs for visualization purposes only.

Task-dependent functional connectivity analysis

We examined hippocampus-based functional connectivity changes via PPI analysis65. The hippocampal seed was separately defined as a 4-mm sphere centered at the local peak of corresponding clusters showing significant interaction effects between Suppression and Time in the univariate GLM analysis. To accommodate more than two experimental conditions within same model, we employed a generalized form of task-dependent PPI (gPPI)66 The physiological activity of given hippocampal seed region was computed as the mean time series of all voxels. They were then deconvolved to estimate neural activity. Next, four PPI regressors, corresponding to each task regressor from the individual level were obtained by multiplying the estimated neuronal activity from the seed region with a vector coding for effects of each condition, forming four psychophysiological interaction vectors. They were further convolved with a canonical HRF to form four PPI regressors of interest. Task-related activations were also included in this GLM to remove out the effects of common driving inputs on brain connectivity.

Contrast images corresponding to PPI effects at the individual-subject level were then submitted to a 2 (Suppression) by 2 (Time) repeated-measures ANOVA for a second-level group analysis. Similar to the univariate GLM analysis above, significant clusters were initially masked by a gray matter mask, and then determined using a height threshold of P<0.01 and an extent threshold of P<0.05 with family-wise error correction for multiple comparisons based on nonstationary suprathreshold cluster-size distributions63.

To investigate brain functional connectivity patterns associated with intentional forgetting and incidental forgetting between 30-min and 24-h conditions, we conducted an additional hippocampal-seeded PPI analysis by taking memory status into account, with a particular focus on NoThink trials that were later forgotten in the post-scan testing phase (ie, [NTf_24 h—Tf_24 h] versus [NTf_30 min—Tf_30 min]). The seed voxels for the connectivity analysis were chosen around the peak coordinate of the hippocampal cluster with a 4-mm sphere of voxels identified from the additional GLM analysis while taking memory status into account. Other settings were same as the above PPI analysis.

Prediction analysis

We employed a machine-learning approach with balanced fourfold cross-validation to mitigate shortcomings of conventional regression models and test for generalizability of the established relationship to out-of-sample individual subjects67. For example, we entered memory suppression scores for each individual as dependent variable, and hippocampal activation as independent variable. Then, we estimated r (predicted, observed) to measure how well the hippocampal activations predict the memory suppression scores using a balanced fourfold cross-validation procedure. Data were divided into four folds, and a linear regression model was built using three folds, leaving one fold out. A final r (predicted, observed) was computed based on the average of four repetition of this procedure. Finally, we used a nonparametric testing approach to test for the statistical significance of the model by generating 1,000 surrogate data sets under the null hypothesis of r (predicted, observed) (ref. 68). The statistical significance (P value) of the model was determined by measuring the percentage of generated surrogate data that are greater than the r (predicted, observed).

Multivoxel pattern dissimilarity analysis

To assess multivoxel pattern dissimilarity associated with newly acquired and overnight aversive memories, we modelled each item (collapsing across four repetitions) as a separate regressor, convolved with a canonical HRF implemented in SPM8. This resulted in 36 regressors in total and 9 regressors for each condition. Contrast images for each item versus fixation, generated at the individual level analysis within each condition were then submitted to subsequent inter-item multivariate pattern dissimilarity analysis for the hippocampal ROIs as well as for the whole brain.

ROI-based pattern dissimilarity analysis

For each of four experimental conditions, we extracted voxel-wise brain activation estimates for each item within the same condition from the defined ROIs, and reshaped them into a single dimensional vector for each ROI. Pairwise correlations were then computed among distributed voxels of each ROI, resulting in N × (N−1)/2 pairwise correlation coefficients, with N representing the number of items in each condition. The dissimilarity score was determined by Fisher’s Z transformation of 1 minus the correlation coefficient, separately for each participant24,25,26. The data were then submitted to a 2-by-2 repeated-measures ANOVA with Suppression and Time as within-subject factors for the second-level analysis to investigate differences in pattern dissimilarity between retrieval and suppression of newly acquired and overnight-consolidated aversive memories.

Whole-brain pattern dissimilarity analysis

We further implemented a searchlight method to measure inter-item multivoxel pattern dissimilarity at the whole brain level23,24, using a 6-mm spherical region of interest69. As with the ROI-based analysis, we computed the inter-item multivoxel pattern similarity for each condition within each searchlight. The analysis was then repeated for a searchlight centered on every voxel in the brain. Searchlight maps for all four conditions were then entered into a 2-by-2 ANOVA with Suppression and Time as within-subject factors on the second level group analysis to determine changes in pattern dissimilarity between retrieval and suppression of newly acquired and overnight-consolidated aversive memories. Significant clusters were identified using a height threshold of P<0.01 and an extent threshold of P<0.05 with family-wise error corrections63.

To further examine the relationship between memory suppression and neural representation patterns, we performed separate regression analyses for the whole-brain pattern dissimilarity maps with suppression-induced forgetting scores in newly acquired or overnight consolidation condition as a covariate of interest. Parallel analyses were also conducted to examine the relationship between hippocampal pattern dissimilarity and DLPFC engagement in the suppression of either newly acquired or consolidated aversive memories, separately. The significant clusters were determined using the same criterion from the above GLM and gPPI analyses.

Estimates of effect size and post-hoc statistical power

Effect sizes for ANOVAs are partial eta squared, referred to as . For paired t-tests, we calculated Cohen’s d using the mean difference score as the numerator and the pooled s.d. from both repeated measures as the denominator70. This effect size is referred to in the text as d av , in which the ‘av’ refers to the use of the average s.d. in the calculation. The post-hoc statistical power was calculated based on the given type I error rate (α=0.05), the corresponding sample size and effect size.

Data availability

The data and codes that support the findings of this study are available from the corresponding author on request.