Methods

Participants with a potentially life-threatening cancer diagnosis and a DSM-IV diagnosis that included anxiety and/or mood symptoms were recruited through flyers, internet, and physician referral. Of 566 individuals who were screened by telephone, 56 were randomized. Figure 1 shows a CONSORT flow diagram. Table 1 shows demographics for the 51 participants who completed at least one session. The two randomized groups did not significantly differ demographically. All 51 participants had a potentially life-threatening cancer diagnosis, with 65% having recurrent or metastatic disease. Types of cancer included breast (13 participants), upper aerodigestive (7), gastrointestinal (4), genitourinary (18), hematologic malignancies (8), other (1). All had a DSM-IV diagnosis: chronic adjustment disorder with anxiety (11 participants), chronic adjustment disorder with mixed anxiety and depressed mood (11), dysthymic disorder (5), generalized anxiety disorder (GAD) (5), major depressive disorder (MDD) (14), or a dual diagnosis of GAD and MDD (4), or GAD and dysthymic disorder (1). Detailed inclusion/exclusion criteria are in the online Supplementary material. The Johns Hopkins IRB approved the study. Written informed consent was obtained from participants.

A two-session, double-blind cross-over design compared the effects of a low versus high psilocybin dose on measures of depressed mood, anxiety, and quality of life, as well as measures of short-term and enduring changes in attitudes and behavior. Participants were randomly assigned to one of two groups. The Low-Dose-1st Group received the low dose of psilocybin on the first session and the high dose on the second session, whereas the High-Dose-1st Group received the high dose on the first session and the low dose on the second session. The duration of each participant’s participation was approximately 9 months (mean 275 days). Psilocybin session 1 occurred, on average, approximately 1 month after study enrollment (mean 28 days), with session 2 occurring approximately 5 weeks later (mean 38 days). Data assessments occurred: (1) immediately after study enrollment (Baseline assessment); (2) on both session days (during and at the end of the session); (3) approximately 5 weeks (mean 37 days) after each session (Post-session 1 and Post-session 2 assessments); (4) approximately 6 months (mean 211 days) after Session 2 (6-month follow-up).

The study compared a high psilocybin dose (22 or 30 mg/70 kg) with a low dose (1 or 3 mg/70 kg) administered in identically appearing capsules. When this study was designed, we had little past experience with a range of psilocybin doses. We decreased the high dose from 30 to 22 mg/70 kg after two of the first three participants who received a high dose of 30 mg/70 kg were discontinued from the study (one from vomiting shortly after capsule administration and one for personal reasons). Related to this decision, preliminary data from a dose-effect study in healthy participants suggested that rates of psychologically challenging experiences were substantially greater at 30 than at 20 mg/70 kg ( Griffiths et al., 2011 ). The low dose of psilocybin was decreased from 3 to 1 mg/70 kg after 12 participants because data from the same dose-effect study showed significant psilocybin effects at 5 mg/70 kg, which raised concern that 3 mg/70 kg might not serve as an inactive placebo.

Expectancies, on part of both participants and monitors, are believed to play a large role in the qualitative effects of psilocybin-like drugs ( Griffiths et al., 2006 ; Metzner et al., 1965 ). Although double-blind methods are usually used to protect against such effects, expectancy is likely to be significantly operative in a standard drug versus placebo design when the drug being evaluated produces highly discriminable effects and participants and staff know the specific drug conditions to be tested. For these reasons, in the present study a low dose of psilocybin was compared with a high dose of psilocybin, and participants and monitors were given instructions that obscured the actual dose conditions to be tested. Specifically, they were told that psilocybin would be administered in both sessions, the psilocybin doses administered in the two sessions might range anywhere from very low to high, the doses in the two sessions might or might not be the same, sensitivity to psilocybin dose varies widely across individuals, and that at least one dose would be moderate to high. Participants and monitors were further strongly encouraged to try to attain maximal therapeutic and personal benefit from each session.

Drug sessions were conducted in an aesthetic living-room-like environment with two monitors present. Participants were instructed to consume a low-fat breakfast before coming to the research unit. A urine sample was taken to verify abstinence from common drugs of abuse (cocaine, benzodiazepines, and opioids including methadone). Participants who reported use of cannabis or dronabinol were instructed not to use for at least 24 h before sessions. Psilocybin doses were administered in identically appearing opaque, size 0 gelatin capsules, with lactose as the inactive capsule filler. For most of the time during the session, participants were encouraged to lie down on the couch, use an eye mask to block external visual distraction, and use headphones through which a music program was played. The same music program was played for all participants in both sessions. Participants were encouraged to focus their attention on their inner experiences throughout the session. Thus, there was no explicit instruction for participants to focus on their attitudes, ideas, or emotions related to their cancer. A more detailed description of the study room and procedures followed on session days is provided elsewhere ( Griffiths et al., 2006 ; Johnson et al., 2008 ).

A description of session monitor roles and the content and rationale for meetings between participants and monitors is provided elsewhere ( Johnson et al., 2008 ). Briefly, preparation meetings before the first session, which included discussion of meaningful aspects of the participant’s life, served to establish rapport and prepare the participant for the psilocybin sessions. During sessions, monitors were nondirective and supportive, and they encouraged participants to “trust, let go and be open” to the experience. Meetings after sessions generally focused on novel thoughts and feelings that arose during sessions. Session monitors were study staff originally trained by William Richards PhD, a clinical psychologist with extensive experience conducting studies with classic hallucinogens. Monitor education varied from college graduate to PhD. Formal clinical training varied from none to clinical psychologist. Monitors were selected as having significant human relations skills and self-described experience with altered states of consciousness induced by means such as meditation, yogic breathing, or relaxation techniques.

After study enrollment and assessment of baseline measures, and before the first psilocybin session, each participant met with the two session monitors (staff who would be present during session days) on two or more occasions (mean of 3.0 occasions for a mean total of 7.9 hours). The day after each psilocybin session participants met with the session monitors (mean 1.2 hours). Participants met with monitors on two or more occasions between the first and second psilocybin session (mean of 2.7 occasions for a mean total of 3.4 hours) and on two or more occasions between the second session and 6-month follow-up (mean of 2.5 occasions for a mean total of 2.4 hours). Preparation meetings, the first meeting following each session, and the last meeting before the second session were always in person. For the 37 participants (73%) who did not reside within commuting distance of the research facility, 49% of the Post-session 1 meetings with monitors occurred via telephone or video calls.

The questionnaire included three final questions (see Griffiths et al. 2006 for more specific wording): (1) How personally meaningful was the experience? (rated from 1 to 8, with 1 = no more than routine, everyday experiences; 7 = among the five most meaningful experiences of my life; and 8 = the single most meaningful experience of my life). (2) Indicate the degree to which the experience was spiritually significant to you? (rated from 1 to 6, with 1 = not at all; 5 = among the five most spiritually significant experiences of my life; 6 = the single most spiritually significant experience of my life). (3) Do you believe that the experience and your contemplation of that experience have led to change in your current sense of personal well-being or life satisfaction? (rated from +3 = increased very much; +2 = increased moderately; 0 = no change; –3 = decreased very much).

The Persisting Effects Questionnaire assessed self-rated positive and negative changes in attitudes, moods, behavior, and spiritual experience attributed to the most recent psilocybin session ( Griffiths et al., 2006 , 2011 ). At the 6-month follow-up, the questionnaire was completed on the basis of the high-dose session, which was identified as the session in which the participant experienced the most pronounced changes in their ordinary mental processes. Twelve subscales (described in Table 8 ) were scored.

Three measures of spirituality were assessed at three time-points: Baseline, 5 weeks after session 2, and at the 6-month follow-up: FACIT-Sp, a self-rated measure of the spiritual dimension of quality of life in chronic illness ( Peterman et al., 2002 ) assessed on how the participant felt “on average”; Spiritual-Religious Outcome Scale, a three-item measure used to assess spiritual and religious changes during illness ( Pargament et al., 2004 ); and Faith Maturity Scale, a 12-item scale assessing the degree to which a person’s priorities and perspectives align with “mainline” Protestant traditions ( Benson et al., 1993 ).

Structured telephone interviews with community observers (e.g. family members, friends, or work colleagues) provided ratings of participant attitudes and behavior reflecting healthy psychosocial functioning ( Griffiths et al., 2011 ). The interviewer provided no information to the rater about the participant or the nature of the research study. The structured interview (Community Observer Questionnaire) consisted of asking the rater to rate the participant’s behavior and attitudes using a 10-point scale (from 1 = not at all, to 10 = extremely) on 13 items reflecting healthy psychosocial functioning: inner peace; patience; good-natured humor/playfulness; mental flexibility; optimism; anxiety (scored negatively); interpersonal perceptiveness and caring; negative expression of anger (scored negatively); compassion/social concern; expression of positive emotions (e.g. joy, love, appreciation); self-confidence; forgiveness of others; and forgiveness of self. On the first rating occasion, which occurred soon after acceptance into the study, raters were instructed to base their ratings on observations of and conversations with the participant over the past 3 months. On two subsequent assessments, raters were told their previous ratings and were instructed to rate the participant based on interactions over the last month (post-session 2 assessment) or since beginning in the study (6-month follow-up). Data from each interview with each rater were calculated as a total score. Changes in each participant’s behavior and attitudes after drug sessions were expressed as a mean change score (i.e. difference score) from the baseline rating across the raters. Of 438 scheduled ratings by community observers, 25 (<6%) were missed due to failure to return calls or to the rater not having contact with the participant over the rating period.

The two primary therapeutic outcome measures were the widely used clinician-rated measures of depression, GRID-HAM-D-17 ( ISCDD, 2003 ) and anxiety, HAM-A assessed with the SIGH-A ( Shear et al., 2001 ). For these clinician-rated measures, a clinically significant response was defined as ⩾50% decrease in measure relative to Baseline; symptom remission was defined as ⩾50% decrease in measure relative to Baseline and a score of ⩽7 on the GRID-HAMD or HAM-A ( Gao et al., 2014 ; Matza et al., 2010 ).

Seventeen measures focused on mood states, attitudes, disposition, and behaviors thought to be therapeutically relevant in psychologically distressed cancer patients were assessed at four time-points over the study: immediately after study enrollment (Baseline assessment), about 5 weeks (mean 37 days) after each session (Post-session 1 and 2 assessments), and about 6 months (mean 211 days) after session 2 (6-month follow-up).

Ten minutes before and 30, 60, 90, 120, 180, 240, 300, and 360 min after capsule administration, blood pressure, heart rate, and monitor ratings were obtained as described previously ( Griffiths et al., 2006 ). The two session monitors completed the Monitor Rating Questionnaire, which involved rating or scoring several dimensions of the participant’s behavior or mood. The dimensions, which are expressed as peak scores in Table 2 , were rated on a 5-point scale from 0 to 4. Data were the mean of the two monitor ratings at each time-point.

Statistical analysis

Differences in demographic data between the two dose sequence groups were examined with t-tests and chi-square tests with continuous and categorical variables, respectively.

Data analyses were conducted to demonstrate the appropriateness of combining data for the 1 and 3 mg/70 kg doses in the low-dose condition and for including data for the one participant who received 30 mg/70 kg. To determine if the two different psilocybin doses differed in the low-dose condition, t-tests were used to compare participants who received 3 mg/70 kg (n = 12) with those who received 1 mg/70 kg (n = 38) on participant ratings of peak intensity of effect (HRS intensity item completed 7 h after administration) and peak monitor ratings of overall drug effect across the session. Because neither of these were significantly different, data from the 1 and 3 mg/70 kg doses were combined in the low-dose condition for all analyses.

Of the 50 participants who completed the high-dose condition, one received 30 mg/70 kg and 49 received 22 mg/70 kg. To determine if inclusion of the data from the one participant who received 30 mg/70 kg affected conclusions about the most therapeutically relevant outcome measures, the analyses for the 17 measures shown in Tables 4 and 5 were conducted with and without that participant. Because there were few differences in significance (72 of 75 tests remained the same), that participant’s data were included in all the analyses.

To examine acute drug effects from sessions, the drug dose conditions were collapsed across the two dose sequence groups. The appropriateness of this approach was supported by an absence of any significant group effects and any group-by-dose interactions on the cardiovascular measures (peak systolic and diastolic pressures and heart rate) and on several key monitor- and participant-rated measures: peak monitor ratings of drug strength and joy/intense happiness, and end-of-session participant ratings on the Mysticism Scale.

Six participants reported initiating medication treatment with an anxiolytic (2 participants), antidepressant (3), or both (1) between the Post-session 2 and the 6-month follow-up assessments. To determine if inclusion of these participants affected statistical outcomes in the analyses of the 6-month assessment, the analyses summarized in Tables 4, 5, 6, 7 and 8 were conducted with and without these six participants. All statistical outcomes remained identical. Thus, data from these six participants were retained in the data analyses.

For cardiovascular measures and monitor ratings assessed repeatedly during sessions, repeated measures regressions were conducted in SAS PROC MIXED using an AR(1) covariance structure and fixed effects of dose and time. Planned comparison t-tests were used to assess differences between the high- and low-dose condition at each time-point.

Peak scores for cardiovascular measures and monitor ratings during sessions were defined as the maximum value from pre-capsule to 6 h post-capsule. These peak scores and the end-of-session ratings (Tables 2 and 3) were analyzed using repeated measures regressions in SAS PROC MIXED with a CS covariance structure and fixed effects of group and dose.

Table 3. Participant ratings on questionnaires completed 7 hours after psilocybin administration+.

For the analyses of continuous measures described below, repeated measures regressions were conducted in SAS PROC MIXED using an AR(1) covariance structure and fixed effects of group and time. Planned comparison t-tests (specified below) from these analyses are reported. For dichotomous measures, Friedman’s Test was conducted in SPSS for both the overall analysis and planned comparisons as specified below. All results are expressed as unadjusted scores.

For the measures that were assessed in the two dose sequence groups at Baseline, Post-session 1, Post-session 2, and 6 months (Tables 4 and 5), the following planned comparisons most relevant to examining the effects of psilocybin dose were conducted: Between-group comparisons at Baseline, Post 1, and Post 2; and within-group comparisons of Baseline versus Post 1 in both dose sequence groups, and Post 1 versus Post 2 in the Low-Dose-1st (High-Dose-2nd) Group. A planned comparison between Baseline and 6 months collapsed across groups was also conducted. Effects sizes were calculated using Cohen’s d.

Table 4. Effects of psilocybin on the 11 therapeutically relevant outcome measures assessed at Baseline, Post-session 1 (5 weeks after Session 1), Post-session 2 (5 weeks after Session 2), and 6 months follow-up that fulfilled conservative criteria for demonstrating an effect of psilocybin+.

Table 5. Effects of psilocybin on six therapeutically relevant outcome measures assessed at Baseline, Post-session 1 (5 weeks after Session 1), Post-session 2 (5 weeks after Session 2), and 6 months that did not fulfill conservative criteria for demonstrating an effect of psilocybin+.

For measures assessed only at Baseline, Post 2, and 6 months (Table 7), between-group planned comparisons were conducted at Baseline, Post 2, and 6 months. Because measures assessed only at these time-points cannot provide information about the psilocybin dose, data were collapsed across the two dose sequence groups and planned comparisons were conducted comparing Baseline with Post 2 and Baseline with 6 months.

For participant ratings of persisting effects attributed to the session (e.g. Table 8), planned comparisons for continuous and dichotomous measures were conducted between: (1) ratings at 5 weeks after the low versus high-dose sessions; (2) ratings of low dose at 5 weeks versus ratings of high dose at the 6-month follow-up; (3) ratings of high dose at 5 weeks versus ratings of high dose at the 6-month follow-up.

As described above, clinician-rated measures of depression (GRID-HAMD) and anxiety (HAM-A) were analyzed as continuous measures. In addition for both measures, a clinically significant response was defined as ⩾50% decrease in measure relative to Baseline; symptom remission was defined as ⩾50% decrease in measure relative to Baseline and a score of ⩽7. Planned comparisons were conducted via independent z-tests of proportions between the two dose sequence groups at Post-session 1, Post-session 2, and 6 months. To determine if effects were sustained at 6 months, planned comparisons were also conducted via dependent z-tests of proportions between Post-session 2 versus 6 months in the Low-Dose-1st (High-Dose-2nd) Group, and between Post-session 1 versus 6 months in the High-Dose-1st (Low-Dose-2nd) Group.

Exploratory analyses used Pearson’s correlations to examine the relationship between total scores on the Mystical Experience Questionnaire (MEQ30) assessed at the end of session 1 and enduring effects assessed 5 weeks after session 1. The Post-session 1 measures were ratings on three items from the Persisting Effects Questionnaire (meaningfulness, spiritual significance, and life satisfaction) and 17 therapeutically relevant measures assessed at Baseline and Post 1 (Tables 4 and 5) expressed as difference from baseline scores. Significant relationships were further examined using partial correlations to control for end-of-session participant-rated “Intensity” (item 98 from the HRS). To examine MEQ30 scores as a mediator of the effect of psilocybin dose on therapeutic effects, a bootstrap analysis was done using the PROCESS macro (Hayes, 2013) in SPSS. Bootstrapping is a non-parametric method appropriate for small samples, which was used to estimate 95% confidence intervals for the mediation effect. The PROCESS macro also calculated direct effects on outcome for both group effects and MEQ30.