Study 1

Participants and stimuli

Thirty-seven undergraduates were run through an emotional picture-viewing task in exchange for partial course credit. Informed consent was obtained from all participants. Eight participants were excluded from analyses because of excessive artifacts due to eye movements resulting in rejection of 100% of trials (n = 3) and body movements resulting in rejection of > 60% of trials (n = 5). The final sample submitted to analysis included 29 (18 female) participants. The mean age was 18.86 (SD = 1.15). All participants were native English speakers. All procedures were performed in accodance with the relevant guidelines and regulations and approved by Michigan State University’s Institutional Review Board.

The stimulus set consisted of 60 neutral and 60 negative images selected from the International Affective Picture System (IAPS). The following images were included: 1050, 1200, 1300, 1525, 1930, 2036, 2102, 2110, 2190, 2200, 2206, 2210, 2214, 2215, 2230, 2320, 2357, 2383, 2393, 2495, 2570, 2661, 2683, 2688, 2692, 2694, 2703, 2710, 2716, 2751, 2753, 2799, 2800, 2810, 2811, 2840, 3001, 3010, 3120, 3181, 3213, 3216, 3220, 3230, 3301, 3350, 3500, 3530, 3550, 5500, 5531, 5971, 6021, 6150, 6211, 6212, 6242, 6300, 6312, 6313, 6315, 6550, 6563, 6821, 6825, 6838, 7000, 7002, 7003, 7004, 7006, 7009, 7010, 7012, 7016, 7018, 7020, 7021, 7025, 7026, 7030, 7031, 7035, 7041, 7050, 7056, 7080, 7100, 7110, 7140, 7150, 7160, 7170, 7175, 7190, 7211, 7217, 7224, 7233, 7235, 7254, 7550, 7620, 7700, 7950, 9250, 9253, 9260, 9410, 9421, 9425, 9428, 9440, 9620, 9622, 9800, 9810, 9903, 9908, 9921.

Normative ratings indicated that negative images were rated as both more negative (Negative: M = 2.50, SD = 0.73; Neutral: M = 4.96, SD = 0.41; t(118) = 22.64, p < .001) and more arousing (Negative: M = 6.06, SD = 0.74; Neutral: M = 3.04, SD = 0.68; t(118) = 23.22, p < .001) than neutral images. Images used for the First and Third person conditions did not differ on either dimension (ts(118) < 1, ps > .44).

The task was administered on a Pentium D class computer, using E-Prime software (Psychology Software Tools; Pennsylvania, US) to control the presentation and timing of all stimuli. Each picture was displayed in color and occupied the entirety of a 19in (48.26 cm) monitor. Participants were seated approximately 60 cm from the monitor.

Procedure

Participants completed a cue-picture paradigm, similar in format to previous research on emotion regulation9, 16. The task comprised two blocks, one for each Instruction Type (i.e. First- vs. Third-Person). The order of Instruction Type block was counterbalanced across participants. Each block contained 60 cue-picture trials comprised of 30 neutral and 30 negative IAPS images equally crossed with the two Instruction Type cues. The order of cue-picture trials was random. Fig. 1 illustrates a schematic of the trial structure across both blocks. For each trial, participants first viewed an instruction phrase (“First-Person” or “Third-Person”) for 2 s that directed them how to think about the following picture. “First-Person” indicated that the participant should reflect on their feelings elicited by the pictures using the pronoun “I” as much as possible (i.e., “…ask yourself ‘what am I feeling right now?’”). “Third-Person” indicated that the participant should reflect on their feelings elicited by the pictures using their own name as much as possible (i.e., “…ask yourself ‘what is [participant’s name] feeling right now?”). Participants were further told not to generate unrelated thoughts or images to alter their responses. For all instructions, participants were told to view the pictures for the entire display period and to not look away or close their eyes. After the instruction phrase, a blank screen was presented for 500 ms followed by a centrally presented white fixation cross lasting 500 ms. Following the fixation cross, the IAPS images were displayed for 6 s. A period of 2.5 s was inserted between the offset of images and the presentation of the next instruction phrase during which time participants were instructed to relax and clear their minds.

Participants completed practice trials before each block to familiarize themselves with the timing of events and instructions. The experimental task then included 120 cue-picture trials: 30 Neutral/First-Person, 30 Neutral/Third-Person, 30 Negative/First-Person, and 30 Negative/Third-Person. At the end of each block participants were asked to rate the extent to which they used the first- vs. third-person pronouns when focusing on their feelings using a 1 (not at all) to 7 (all the time) likert scale.

Psychophysiological Recording and Data Reduction

Continuous electroencephalographic (EEG) activity was recorded using the ActiveTwo Biosemi system (Biosemi, Amsterdam, The Netherlands). Recordings were taken from 64 Ag-AgCl electrodes embedded in a stretch-lycra cap. Additionally, two electrodes were placed on the left and right mastoids. Electro-oculogram (EOG) activity generated by eye-movements and blinks was recorded at FP1 and three additional electrodes placed inferior to the left pupil and on the left and right outer canthi. During data acquisition, the Common Mode Sense active electrode and Driven Right Leg passive electrode formed the ground per Biosemi design specifications. The function of the CMS-DRL loop, in addition to forming a reference, is simply to constrain the common mode voltage (i.e. the average voltage of the participant), which limits the amount of current that can possibly return to the participant. Bioelectric signals were sampled at 512 Hz.

Initial electrical signal processing was performed offline using BrainVision Analyzer 2 (BrainProducts, Gilching, Germany). Scalp electrode recordings were re-referenced to the mean of the mastoids and band-pass filtered (cutoffs: 0.01–20 Hz; 12 dB/oct rolloff). Ocular artifacts were corrected using the method developed by Gratton and colleagues30. Cue- and picture-locked data were segmented into individual epochs beginning 500 ms before stimulus onset and continuing for 3 s and 6 s, respectively. Physiologic artifacts were detected using a computer-based algorithm such that trials in which the following criteria were met were rejected: a voltage step exceeding 50 μV between contiguous sampling points, a voltage difference of 300 μV within a trial, and a maximum voltage difference of less than 0.5 μV within 100 ms intervals. The average activity in the 500 ms window prior to cue and picture onset served as the baseline and was subtracted from each data point subsequent to cue and picture onset.

Data Analysis Strategy

For analysis of the LPP, we generated topographically organized clusters of electrodes in order to reduce the spatial dimensions of the dataset. Following the suggestion of Dien and Santuzzi31, we computed the following clusters using the average of the noted electrodes: Left-Anterior-Superior (AF3, F1, F3, FC1 and FC3), Right-Anterior-Superior (AF4, F2, F4, FC2 and FC4), Left-Anterior-Inferior (AF7, F5, F7, FC5, and FT7), Right-Anterior-Inferior (AF8, F6, F8, FC6, and FT8), Left-Posterior-Superior (CP1, CP3, P1, P3, and PO3), Right-Posterior-Superior (CP2, CP4, P2, P4, and PO4), Left-Posterior-Inferior (CP5, P5, P7, PO7, and TP7), and Right-Posterior-Inferior (CP6, P6, P8, PO8, and TP8). Because the LPP is a broad and sustained waveform we conducted two analyses of its amplitude across time, following convention9, 11. Specifically, research indicates that early time windows (300–1000 ms) index attention allocation whereas later time windows (>1000 ms) index memory and meaning-making stages11. As such, we submitted the early LPP amplitude elicited between 400–1000 ms to a 2 (Time: 400–700 and 700–1000 ms) X 2 (Valence: Neutral and Negative) X 2 (Self-talk Strategy: First-Person and Third-Person) X 2 (Hemisphere: Left and Right) X 2 (Anterior and Posterior) X 2 (Superior and Inferior) repeated measures analysis of variance (rANOVA). This allowed us to capture the early, attention-related LPP before the sustained portion of the LPP begins around 1000 ms [see also refs 9 and 18]. To understand the timing of memory and meaning-making processes, we submitted the sustained portion of the LPP to a 5 (Time: 1–2 s, 2–3 s, 3–4 s, 4–5 s, and 5–6 s) X 2 (Valence: Neutral and Negative) X 2 (Self-Talk Strategy: First-Person and Third-Person) X 2 (Hemisphere: Left and Right) X 2 (Anterior and Posterior) X 2 (Superior and Inferior) rANOVA [see also refs 9 and 18].

The SPN was identified and quantified at fronto-central electrodes (F1, Fz, F2, FC1, FCz, FC2) as in previous work9, 14. Previous work indicates that early enhancement of the SPN reflects orienting to the preceding cue whereas later increases reflect anticipation of and preparation to act on the upcoming imperative stimulus14. Therefore, consistent with past research9, 14, the early SPN was defined as the average voltage in the 300–2300 ms time window post-cue onset and the late SPN was defined as the average voltage in the 2300–3000 ms time window post-cue onset, the latter corresponding to the 700 ms immediately preceding picture onset during which time participants viewed the blank screen and fixation cross separating the cue and the picture. The early and late SPN were submitted to separate 2 (Valence: Neutral and Negative) X 2 (Self-talk Strategy: First-Person and Third-Person) rANOVAs.

Additional analyses of the LPP and SPN effects are available in the supplemental materials.

Study 2

Participants

Fifty-two individuals (32 females) were recruited for participation. Participants were recruited via flyers posted and advertisements posted on Facebook and Craig’s List. The sample consisted of 71.15% Caucasian, 15.38% Asian, 7.69% African American, 1.92% Native American, and 3.85% other. The mean age was 20.19 (SD = 2.66). All participants were right-handed native English language speakers and received up to $50 for their participation. Informed consent was obtained from all participants. All procedures were performed in accordance with the relevant guidelines and regulations and approved by the University of Michigan’s Institutional Review Board.

Data from two participants was excluded from all analyses due to a technical malfunction during the fMRI task. Specifically, one participant did not see their name appear during the fMRI task, and the scan data from one participant was not properly saved. Thus, all of the descriptive data we report from here on pertains to the subsample of fifty participants that were included in our analyses.

Screening Session: Stimuli Harvesting

Similar to prior studies that have used script-driven methods, cue phrases were used to trigger the recall of negative autobiographical memories in the scanner. Following protocols implemented in prior research32, 33, we obtained memory cues by asking participants to recall and then describe in writing eight highly arousing negative autobiographical experiences during a screening held before the scan session (M days = 10.81; SD days = 7.87). To qualify for the study, participants had to have experienced eight unique negative events which led them to feel intensely distressed each time they thought about them. To ensure that this criterion was met, a memory was considered “eligible” if participants rated it above the midpoint of a 1 (not intense) to 9 (extremely intense) arousal scale (M = 7.49, SD = 1.07) and below the midpoint on a 1 (extremely negative) to 9 (extremely positive) valence scale (M = 2.05; SD = 0.97).

fMRI Negative Emotion Elicitation Task

The Negative Emotion Elicitation task was modeled after prior fMRI and behavioral research indicating that cueing people to recall autobiographical negative experiences is an effective way of reactivating intense idiosyncratic negative emotion e.g.32, 33. The stimuli for this task consisted of cue phrases (e.g., rejected by Marc; party with Ted) that appeared in the center of each screen, which directed participants to focus on a specific negative past experiences. Participants generated these cue-phrases on their own, before the day of scanning, using a procedure developed in prior research32, 33. Specifically, they first wrote about each of their experiences. Subsequently, they were asked to create a cue-phrase that captured the gist of their experience (e.g., barking dog). They were reminded of the cues they generated and the experiences they referred to on the day of scanning following established procedures.

Task Training

Before scanning, participants were told that each trial would begin with a fixation cross, which they were asked to stare directly at. Next, they were told that they would see a linguistic cue, which would instruct them how to introspect (i.e., using I or their name) during the trial. Next, they were told that they would see another fixation-cross followed by a memory cue-phrase that they generated during the previous session. When they saw the memory cue they were asked to reflect over that memory using the part of speech (I or their own name) that was previously presented to them. To ensure that they used the correct part of speech when introspecting about each memory the same linguistic cue that was presented earlier during the trial appeared beneath the memory cue on the bottom of the screen. Subsequently, participants were told they would have 3-s to rate how they felt using a five-point scale (1 = not at all negative; 5 = very negative; see Fig. 2 for an illustration of the task).

Functional MRI Acquisition and Analysis

Whole-brain functional data were acquired on a GE Signa 3-Tesla scanner. A spiral sequence with 40 contiguous slices with 3.44 × 3.44 × 3 mm voxels (repetition time (TR) = 2000 ms; echo time (TE) = 30; flip angle = 90°; field of view (FOV) = 22 cm) was used to acquire functional T2* weighted images. Structural data were acquired with a T1-weighted gradient echo anatomical overlay acquired using the same FOV and slices (TR = 250 ms, TE = 5.7 ms, flip angle = 90°). We also collected a 124-slice high-resolution T1-weighted anatomical image using spoiled-gradient-recalled acquisition (SPGR) in steady-state imaging (TR = 9 ms, TE = 1.8 ms, flip angle = 15°, FOV = 25–26 cm, slice thickness = 1.2 mm).

Functional images were corrected for differences in slice timing using 4-point sinc-interpolation34 and were corrected for head movement using MCFLIRT35. Each SPGR anatomical image was corrected for signal in-homogeneity and skull-stripped using FSL’s Brain Extraction Tool36. These images were then segmented with SPM8 (Wellcome Department of Cognitive Neurology, London) into gray matter, white matter and cerebrospinal fluid and normalization parameters for warping into MNI space were recorded. These normalization parameters were applied to the functional images maintaining their original 3.44 × 3.44 × 3 mm

Functional scans were physio-corrected using retroicorr resolution37. The functional scans were further preprocessed with SPM8’s, slice-time correction, and realignment processing functions. The segmented normalization parameters were then applied to the functional scans to warp them into MNI space. Finally, the functional images were spatially smoothed using a 8-mm full-width at half-maximum Gaussian kernel.

Statistical analyses were conducted using the general linear model framework implemented in SPM8. Boxcar regressors, convolved with the canonical hemodynamic response function, modeled periods for the 2-s linguistic cue (“I” or their own name), 4-s fixation cross, 15-s memory cue, and 3-s affect rating. The fixation-cross epoch was used as an implicit baseline.

Whole Brain Analysis

Voxelwise statistical parametric maps summarizing differences between trial types were calculated for each participant and then entered into random-effects group analyses, with statistical maps thresholded at P < 0.05 FWER-corrected for multiple comparisons across gray and white matter. This correction entailed a primary threshold of P < 0.005, with an extent threshold of 146 voxels, which was determined using a Monte Carlo simulation method and was calculated using 3DClustSim38. This technique controls for the FWER by simulating null datasets with the same spatial autocorrelation found in the residual images and creates a frequency distribution of different cluster sizes. Clusters larger than the minimum size corresponding to the a priori chosen FWER are then retained for additional analysis. This cluster-based method of thresholding is often more sensitive to activation when one can reasonably expect multiple contiguous activated voxels38, 39, and is widely used in fMRI research. Our principle analyses contrasted the hemodynamic response on trials on which participants recalled negative autobiographical experiences and then analyzed their feelings surrounding the events using I or their own name.

Region of Interests (ROIs)

We performed region of interest analyses on three sets of brain regions: (a) brain regions that support self-referential processing (e.g., medial prefrontal cortex) (b) brain regions involved in emotional reactivity (e.g., the amygdala), and (c) brain regions that support the effortful, cognitive control of emotion (e.g., posterior dorsomedial prefrontal cortex and bilateral dorsolateral PFC and ventrolateral PFC, and posterior parietal cortex).

The coordinates for each set of ROIs were derived from meta-analyses. Specifically, the self-referential processing ROIs were obtained from a meta-analysis based on data from 28 neuroimaging studies that contrasted the evaluation of the self (i.e., thinking about one’s own traits) versus close and distant others (i.e., thinking about others’ traits18). To conservatively select the ROIs of interest for this analysis, only regions that were active for self vs. both close and distant others were included, as the subject’s own name does not neatly fit into either category of the other. The ROIs corresponding to brain regions involved in emotional reactivity and support the implementation of cognitive emotion regulation were obtained from a recent meta-analysis based on data from 48 neuroimaging studies of reappraisal, the most commonly studied form of cognitive emotion regulation20. We used Robert Welsh’s simpleroibuilder to create spherical ROIs around the peak voxels from prior work. The spherical ROIs were generated to have the same volume as those from the original studies. All ROIs were small volume corrected to an FWER equivalent to p < .05.