Participants

Sixty healthy participants (20 per group, 50% males, 18–34 years) were recruited. Potential participants underwent a physical examination and an electrocardiogram provided detailed information on current and lifetime drug use, and were screened by trained clinical psychologists using a semistructured psychiatric interview on average 17 weeks before the first experimental session. Exclusion criteria included current Axis I DSM-IV disorder, including substance dependence, >5 cigarettes per day, history of psychosis or mania, less than a high school education, lack of English fluency, a body mass index outside 19–30 kg/m2, high blood pressure (>140/90), abnormal electrocardiogram, daily use of any medication other than birth control, pregnancy, or lactating. Participants were eligible if they reported 4–40 past uses of MDMA with no adverse events. Women not taking hormonal contraceptives were tested during their follicular phase because hormonal fluctuations can influence responses to stimulants (White et al, 2002). There were no group differences in participant demographics (Table 1).

Table 1 Demographic Data of Groups Full size table

Qualifying participants attended an orientation session to give consent and practice study tasks. To minimize expectancy, participants were informed that they could receive a stimulant, sedative, cannabinoid, or placebo. Participants were instructed to consume their normal amounts of caffeine and nicotine before sessions but to abstain from using alcohol, prescription drugs (except contraceptives), and over-the-counter drugs for 24 h before the sessions, marijuana for 72 h before the sessions, and other illicit drugs for 48 h before the sessions (due to faster clearance). Participants were notified that there would be drug tests and that they would be rescheduled if they tested positive for any recent drug use at the first session and cancelled from the study if they tested positive at the second session with only partial compensation. Participants were advised to get their normal amounts of sleep and not to eat for 2 h before each session. Following completion of the study, participants were fully debriefed and monetarily compensated. The study took place at the University of Chicago Medical Center and was approved by the Institutional Review Board.

Drug

MDMA (1.0 mg/kg) was prepared for each participant by the hospital pharmacist. The powder form of the drug was obtained from Dr David Nichols of Purdue University and placed in opaque size 00 capsules with dextrose filler. Placebo capsules contained only dextrose. This is a moderate dose of MDMA relative to doses that previously affected memory (75 mg in Kuypers and Ramaekers, 2005).

Design

Subjects were randomly assigned to one of three groups, one that received MDMA during encoding and placebo during retrieval (Encoding), one that received MDMA during retrieval and placebo during encoding (Retrieval), and one that received placebo during both phases (Placebo). All participants attended two sessions separated by 48 h: an encoding session for studying stimuli and a retrieval session for testing memory. Besides the drug manipulation, the procedure for all groups was identical and double-blinded. All sessions began in the morning.

Stimuli

Stimuli consisted of 180 images from the International Affective Picture Set (IAPS; Lang et al, 2008) and 2–3 word labels (eg, ‘dirty toilet’, ‘box of tissues’, ‘chocolate candy bar’) describing these images. The images included emotionally negative, neutral, and positive pictures and were split into two comparable sets for counterbalancing studied and nonstudied items across participants. These pictures had the following mean (SD) normed valences and arousals, respectively: Set A: negative 3.09 (0.51) and 5.21 (0.66), neutral 5.15 (0.43) and 3.51 (0.68), positive 7.10 (0.52) and 5.00 (0.79). Set B: negative 3.14 (0.50) and 5.21 (0.65), neutral 5.10 (0.55) and 4.12 (0.90), positive 7.08 (0.53) and 5.17 (0.77).

Procedure

On the morning of experimental sessions, participants first completed compliance measures including breath alcohol level (Alco-sensor III; Intoximeters, St Louis, MO), a urine drug test (ToxCup, Branan Medical, Irvine, CA), and a pregnancy test (females only; Aimstrip, Craig Medical, Vista, CA), as well as baseline cardiovascular and mood measures. Participants then consumed a capsule and completed cardiovascular and mood measures every 30 min for the next 90 min. Participants were provided with magazines and music in furnished rooms. They were not allowed to eat, sleep, or work, and they had no access to cell phones or Internet. Upon completing tasks, participants watched a movie.

During the encoding session, 90 min post-capsule ingestion, participants viewed all 180 labels, half of which were followed by the corresponding picture. For each label, participants rated on a 5-point scale how much they would like to see the corresponding picture. When a picture was presented, participants rated its positivity and negativity on a 5 × 5 grid with positivity and negativity on orthogonal axes. After this valence rating, they rated the picture’s arousal on a five-point scale. This phase was self-paced and lasted approximately 30 min. There were no group differences in liking ratings of labels and valence/arousal ratings of images, so these will not be reported.

During the retrieval session, 90 min post-capsule ingestion, participants were given two surprise memory tests, a cued recollection test and a picture recognition test. For the cued recollection test, participants were presented with each label and asked whether they had seen the corresponding picture. Afterward, they rated their confidence on a five-point scale and were encouraged to use the entire scale. After the cued recollection test, participants were presented with each picture and had to decide if it had been seen. When a picture was recognized, they were asked if they ‘remember’ the picture or they simply ‘know’ it was presented (Yonelinas, 2002). Participants were instructed that they should give a ‘remember’ response when they could recollect associated details from the event, such as thoughts during its presentation, and they should give a ‘know’ response when they simply knew that a picture had been presented without recollecting specific details. Both memory tests were self-paced and together lasted approximately 45 min.

Dependent Measures

Several measures were obtained to monitor expected drug effects (Table 2). Heart rate and blood pressure were measured using a portable blood pressure monitor (A&D Medical/Life Source, San Jose, CA). Mood measures included the Profile of Mood States (McNair et al, 1971), the Visual Analog Scales (Folstein and Luria, 1973), the Drug Effects Questionnaire (Morean et al, 2013), and an End of Session Questionnaire. See Supplementary Online Materials (SOM) for descriptions of each scale and statistics.

Table 2 Physiological and Mood Measures Full size table

For the cued recollection test (Table 3), hit and false alarm rates were calculated for each valence in each subject. False alarms were subtracted from hits to compute memory accuracy. Finally, high confidence hits, false alarms, and accuracy were calculated by only including responses with the top two levels of confidence (SOM).

Table 3 Cued Recollection Data from Session 2 for Negative, Neutral, and Positive Images Full size table

To estimate recollection and familiarity, confidence data were submitted to a dual process signal detection (DPSD) analysis (Yonelinas, 2002) using the ROC Toolbox for MATLAB (Koen et al, 2016). Confidence data were combined between ‘yes’ and ‘no’ responses to create a 10-point scale. The cumulative proportion of hits is plotted against the cumulative proportion of false alarms from the most stringent criterion (ie, the proportion of hits and false alarms at the highest level of confidence) to the most liberal criterion, ending at (1,1). A receiver operator characteristic (ROC) curve is then fit to these points using maximum likelihood estimation. The DPSD model assumes a threshold process (recollection) can take place on some proportion of trials that is reflected by the y-intercept (measured as a probability). In contrast, familiarity is thought to be a signal detection process, reflected in the curvilinearity of the function (measured in z score units).

For the recognition test (Table 4), hits, false alarms, and accuracy were calculated, and recollection and familiarity estimates were derived from the independence remember/know (IRK) procedure (Yonelinas, 2002). Recollection accuracy was measured by

Table 4 Recognition Data from Session 2 for Negative, Neutral, and Positive Images Full size table

Because a ‘know’ response is the probability of familiarity in the absence of recollection, a correction was made to avoid underestimation. Familiarity accuracy is measured as

In order to avoid negative familiarity estimates and dividing by 0, floor and ceiling hits and false alarms were replaced by 0.5/N and 1–0.5/N, respectively, where N is the maximum number of hits and false alarms that could be made (Macmillan and Creelman, 1991). Note that each of these estimates corrects for subjective responses to nonstudied items, thereby estimating recollection and familiarity unique to items studied in the encoding phase.

Statistical Analysis

Encoding and Retrieval groups were compared separately to the Placebo group. Cued recollection hits, false alarms, accuracy, and high confidence measures were submitted to 2 (group) × 3 (valence) ANOVAs. Recognition hits, false alarms, accuracy, recollection estimates, and familiarity estimates were also submitted to 2 × 3 ANOVAs. When sphericity was violated, a Greenhouse–Geisser correction was applied to the degrees of freedom. Pairwise comparisons were conducted with t tests.

Estimates of recollection and familiarity derived from ROC curves can be calculated individually for each participant (eg, Koen et al, 2013). However, because the number of studied and nonstudied items per condition was low (ie, 30 each compared to 60–150 in Koen et al), confidence data were collapsed across participants to generate aggregate ROC curves. Parameter reliability was assessed via non-parametric bootstrapping. For each condition, distributions of recollection and familiarity estimates were generated by randomly sampling 20 subjects with replacement and running an ROC analysis (10 000 iterations). Pairwise comparisons were made by subtracting distributions and calculating what proportion of the difference distribution lay above 0. Confidence intervals for the difference of two means were obtained from the 2.5 and 97.5% quantiles of the difference distributions.