Participants

Twelve volunteers (7 females, mean age 32.1 ± 7.5 years) took part in this open-label, within-subjects, longitudinal pilot study. Volunteers were included if they were right-handed, between the ages of 18–45, medically healthy (as determined by medical history, physical examination, an electrocardiogram, blood analysis, and urine testing for common drugs of abuse), and psychiatrically healthy (as determined by the Structured Clinical Interview for DSM-IV). Individuals were excluded for MRI contraindications (including past head trauma, claustrophobia, presence of certain implants, and/or non-removable ferrous metals) as well as potential psilocybin contraindications (personal or family histories of psychotic or bipolar disorder, history within past 5 years of moderate or severe substance use disorder, and taking medications with a psychoactive or CNS effect). A urine pregnancy test (for females) and a urine test for common drugs of abuse (for all participants) was required to be negative during screening and the morning of drug administration.

The sample was racially homogenous (100% Caucasian), more than half (58.3%) were married at the time of their participation, 83.3% had earned a Bachelor’s degree or higher, and all reported limited lifetime use of hallucinogens (median of 1, range 1–4 uses), with the most recent use occurring an average of 8.3 years ago. This study was registered at ClinicalTrials.gov (NCT02971605, registered on November 23, 2016). All participants provided informed consent in accordance with the Common Rule and the Declaration of Helsinki. All procedures were approved by the Johns Hopkins University School of Medicine Institutional Review Board, and participants were compensated a total of $240 upon completion of the study.

Study procedures

Upon enrollment, participants underwent preparation, acute care, and aftercare for psilocybin administration sessions following published safety guidelines91. Participants were assigned two session monitors with whom they met during two preparatory meetings before drug administration, for a total of roughly eight hours of preparation time. During preparatory meetings, participants recounted life history and important lifetime events, received training on and practiced each of three emotion tasks that would be performed during MRI assessments (see “Affective Tasks” below), and monitors instructed participants on the range of possible experiences that may be encountered during acute drug effects. Emotion task practice sessions were included to ensure that participants were familiar with all tasks before MRI procedures commenced, and to minimize initial learning effects on these tasks. Participants then completed a single psilocybin administration session lasting roughly 7 hours and using established procedures91 based on several previous and ongoing studies with healthy participants89,92,93,94,95 and clinical populations1,4. Participants returned the day after their psilocybin session to meet with study staff and review the previous day’s psilocybin session.

Psilocybin session

Participants consumed a small low-fat breakfast >1-hour prior to arriving at the Behavioral Pharmacology Research Unit at the Johns Hopkins Bayview Medical Center. Participants remained recumbent on a couch under the supportive supervision of two study staff after ingesting a capsule containing a high dose of psilocybin (25 mg/70 kg) that was prepared by our research pharmacy. Blood pressure, heart rate, and staff ratings of participant behavior were assessed as safety measures at 0, 30, 60, 120, 180, 240, 300, and 360 minutes after capsule administration.

Questionnaires

A battery of questionnaires was completed one day before, one week after, and one month after psilocybin administration to assess emotional function. The Positive and Negative Affect Scale - X (PANAS-X)48 is a 60-item adjective rating scale with a 5-point response format (0 – very slightly or not at all, 1 – a little, 2 – moderately, 3 – quite a bit, 4 - extremely) that is scored into general positive and negative affect sub-scales, as well as a number of facets of positive and negative affect. Participants were asked to indicate the degree to which they generally feel (“that is, how you feel on the average”) the different feelings and emotions described by each adjective. The Profile of Mood States (POMS)46 is a 65-item rating scale with a 5-point response format (0 – Not at all, 1 – a little, 2 – moderately, 3 – quite a bit, 4 - extremely) that is scored into seven sub-scales (tension, depression, anger, fatigue, confusion, vigor, and total mood disturbance). Participants were asked to indicate the degree to which each item described how they had been feeling during the past week including today. The Dispositional Positive Emotions Scale (DPES)50 is a 38-item Likert scale with a 7-point response format (with response anchors at 1 “Strongly disagree”, 4 “Neither agree nor disagree”, and 7 “Strongly agree”) that is scored into seven sub-scales (joy, content, pride, love, compassion, amusement, and awe). Participants were asked to think about each statement and decide how much they agree or disagree with it. The Depression Anxiety Stress Scale (DASS)49 is a 21-item rating scale with a 4-point response format (0 – did not apply to me at all, 1 – applied to me to some degree, or some of the time, 2 – applied to me to considerable degree, or a good part of the time, 3 – applied to me very much, or most of the time) that is scored into three sub-scales (depression, anxiety, and stress). Participants were asked to indicate how much each statement in the DASS applied to them over the past week. The State-Trait Anxiety Inventory (STAI)47 is a 40-item rating scale with a 4-point response format (0 – almost never, 1 – sometimes, 2 – often, 3 – almost always) that is scored into two sub-scales (state anxiety and trait anxiety). For “state” anxiety questions, participants were asked to select the response for each item that best describes how they feel “right now, that is, at this moment”. For “trait” anxiety questions, participants were asked to select the response that best describes how they “generally feel, that is, most of the time”.

Participants also completed measures of personality at screening and again one month after psilocybin. The Big Five Inventory (BFI)51 is a 44-item Likert scale with a 5-point response format (1 – Disagree strongly, 2 – Disagree a little, 3 – Neither agree nor disagree, 4 – Agree a little, 5 – Agree strongly) that is scored into five sub-scales (extraversion, neuroticism, agreeableness, conscientiousness, openness). The Tellegen Absorption Scale (TAS)52 is a 34-item rating scale with a 4-point response format (with response anchors at 0 “Never” and 3 “Always”) that is scored into a single total score for absorption.

MRI Assessments

One day before, one week after, and one month after psilocybin administration, participants completed the emotion discrimination, emotion recognition, and emotional conflict Stroop tasks in that order, with an 8-minute eyes-open resting-state scan between each pair of tasks (16 total minutes of resting scans for each visit), during the measurement of blood-oxygenation level-dependent (BOLD) signal using echo-planar imaging (EPI; TR/TE = 2200/30 ms, flip angle = 75°, voxel size = 3 mm3, 37 axial slices collected in an interleaved fashion with a 1 mm slice gap, with SENSE acceleration factor = 2). All scanning procedures were performed on a Philips 3T MRI scanner equipped with a 32-channel head coil at the F.M. Kirby Research Center for Functional Brain Imaging at the Kennedy Krieger Institute in Baltimore, MD. Each scanning session lasted 60 minutes.

Task performance during MRI sessions began with a short practice task in the scanner before MRI measurement, followed by the full task performance during MRI measurement. All facial emotional stimuli were selected from the NimStim Emotional Facial Expression database96, and balanced within task and between conditions to the degree possible based on sex, race, and the frequency of mouth opened vs closed in each stimulus. Visual stimuli were projected onto a frosted Plexiglas shield at the open end of the scanner bore, which was viewed through a mirror placed on the head coil. Participants made responses using a fiber-optic, MR-safe response device. Stimuli and responses were presented and recorded using Presentation Software (Neurobehavioral Systems, Inc. Berkeley, CA).

Emotion discrimination. During this task, participants viewed an array of three images (one at the top of the screen and two at the bottom of the screen) containing either three emotional (fearful or angry) facial expressions or three geometric shapes (vertically or horizontally oriented ellipsoids)16,17,28. Participants were instructed to press a button (either in the right or left hand) to indicate the image on the bottom of the screen (either the right or the left image) that matched the image at the top of the screen. Participants completed four 30 s blocks of face-matching trials interleaved between five 30 s blocks of shape-matching trials. Each block began with a 3 s cue (“match faces” or “match shapes”) followed by 6 trials (4.5 s per trial) and a 500 ms inter-stimulus interval (total task time: 4 m 57 s).

Emotion recognition. In this task, participants are presented with a series of happy, sad, fearful, angry, and neutral facial expressions and are instructed to press a button to identify the emotion expressed on each face53,97,98. Sixty stimuli (12 stimuli for each emotion) are presented one at a time for 4 seconds each, with a jittered ISI averaging 3 s, and 15 s of rest at the beginning and end of the task, for a total task time of 7 minutes and 30 seconds. An equal number of male and female faces were presented for each emotion. The order of emotions was pseudorandomized according to a genetic algorithm to maximize the statistical separation of each condition99, but within each emotion condition, the order of actual stimuli is randomized.

Emotional conflict Stroop. This task requires that participants identify the valence of emotional facial expressions (targets) with overlaid emotional words (distractors)22,54. Emotional facial expressions consist of 18 happy and 18 sad emotional faces (9 male and 9 female each), matched between emotional conditions on strength of emotional valence, and presented in pseudorandom order. Emotional words consisted of 18 positively valenced and 18 negatively valenced emotional words from the Affective Norms for English Words (ANEW)100 that are matched between valence conditions on the intensity of valence (degree of pleasure vs displeasure), arousal, dominance, and word length (in characters), and paired in pseudorandom order with a facial emotional stimuli. A given target and distractor pair may have congruent or incongruent emotional valence. Stimuli are pseudorandomized to control for the order of congruent (C) and incongruent (I) stimuli, balancing for order effects for the following sequences across gender and emotional valence of the target stimulus: congruent trials that follow a previous contgruent trial (CC), congruent trials that follow a previously incongruent trial (IC), incongruent trials that follow a previous incongruent trial (II), and incongruent trials that follow a previous congruent trial (CI).

Analysis

Self-report questionnaire analysis

Mixed-effects, repeated measures one-way ANOVAs were used to determine the persisting effects of psilocybin on self-report affect measures, comparing each measure between each time point (baseline, 1-week, and 1-month post-psilocybin). Where a significant main effect was observed, we then followed up with post-hoc comparisons between each time point, corrected for multiple comparisons using Tukey’s method for multiple comparisons of all pairwise means101. Paired t-tests were used to test for changes personality measures between screening and 1-month post-psilocybin.

Preprocessing and analysis of task-based BOLD data

All task-based BOLD data underwent preprocessing, region of interest (ROI) extraction, and ROI analysis to determine the response of the left and right amygdala and left and right ACC to task conditions in each fMRI task. Preprocessing steps consisted of slice timing correction, realignment/motion correction, normalization to an EPI template registered to MNI space102, and smoothing using a 6 mm FWHM kernel. The first eigenvariable of all voxels within four ROIs (left and right amygdala and left and right ACC) was extracted for each subject and each scan and submitted to separate subject-level general linear model (GLM) analyses for each affective task at each time point (baseline, 1 week post-psilocybin, and 1 month post-psilocybin).

Subject-level GLM design matrices consisted of six motion parameters from realignment, a motion sensoring or “scrubbing” regressor generated using outlier detection and intermediate settings (global-signal z-value threshold = 5, subject-motion mm threshold = 0.9) in the ART toolbox103, the mean signal within each run, a linear term to model signal drift, a regressor to model all button-presses made by the participant, and regressors of interest for each task. The design matrix for the emotion discrimination task included regressors of interest for face blocks and shape blocks, and a [face > shapes] contrast was fit as the contrast of interest for each subject and time point. The design matrix for the emotion recognition task included a regressor indicating the onset of every stimulus, and separate regressors of interest for each emotional face condition (happy, angry, sad, fearful, and neutral). An emotion greater than all stimulus contrast was fit for each emotional condition ([happy > all stimuli], [angry > all stimuli], etc). The design matrix for the emotional conflict Stroop task included regressors of interest for each of the four first-order sequence types (congruent trials that follow a congruent trial, or CC, incongruent trials that follow a congruent trial, or CI, incongruent trials that follow an incongruent trial, or II, and congruent trials that follow an incongruent trial, or IC). Two contrasts of interest were fit: one for all incongruent greater than all congruent trials ([CI & II > CC & IC]), and one for high-demand incongruent greater than low-demand congruent trials ([CI < CC]).

SPM12 (http://www.fil.ion.ucl.ac.uk/spm/software/spm12/) was used to preprocess all data, and SPM12, MaRSBaR (http://marsbar.sourceforge.net), and MATLAB (R2017a, version 9.2.0.556344) were used to conduct GLM analyses. A one-way ANOVA was fit to subject-level ROI contrasts to determine a main effect of time-point on BOLD response in each ROI for each task. Post-hoc comparisons were conducted using t-tests, corrected for multiple comparisons using the Holm-Bonferroni method104. Analyses were repeated as exploratory whole-brain voxel-wise general linear models to investigate potential effects outside of hypothesized areas. Whole-brain analyses were thresholded at p < 0.0005 (uncorrected), with cluster-forming threshold of p < 0.05 (uncorrected).

Resting state fMRI analysis

Resting-state data were preprocessed as task-based data and then submitted to simultaneous105,106 bandpass filtering (0.009–0.08 Hz) and regression of nuisance parameters. Nuisance parameters consisted of linear trends, the first 5 eigenvectors of cerebrospinal fluid and white matter signal (identified using masks derived from segmented and normalized T1-weighted structural images), 6 motion parameters from realignment, and a motion sensoring or “scrubbing” regressor generated using outlier detection and intermediate settings (global-signal z-value threshold = 5, subject-motion mm threshold = 0.9) in the ART toolbox103. Preprocessed and nuisance-regressed data were then parcellated using the Shen 268-node functional brain atlas56. Voxels within each node were averaged at each acquisition to produce 268 time series (one for each node) for each participant. One subject was excluded from resting-state analysis for missing resting-state data from the 1 week time-point.

Static functional connectivity between each edge (each pair-wise set of nodes from the Shen atlas) was calculated using Pearson correlations. These values and all other correlations were Fisher z-transformed for all statistics. To explore differences in whole brain static connectivity, significant edges (negative and positive) were identified using separate one-sample t tests across participants for each edge and timepoint, thresholded using Bonferroni correction for all 35,778 edges. Although statistically conservative, this procedure yields the most reliable edges across our relatively small sample. All edges that survived this thresholding for at least one time point were then contrasted between time points (baseline vs. one week and baseline vs. one month) using paired t tests (α = 0.05).

Two resting-state scans were collected at each MRI visit, and all resting-state dependent variables were averaged within-subject at each time point and each edge before analysis. Nodes of the Shen atlas cluster into eight canonical functional networks: medial frontal, frontoparietal, default mode, subcortical-cerebellum (including salience), motor, visual I (medial), visual II (occipital pole), and visual association (lateral), yielding 8 additional within-network observations and 28 between-network observations for each outcome measure (static functional connectivity, DCC, and entropy). In order to explore within and between network differences, all edges within each network, or all edges between each pair of networks were averaged and compared across time points (via t test). Visual analysis of the matrix of t-values was used to identify obvious patterns in connectivity change, but should be interpreted with caution.