Sample

Data were drawn from the Suffolk County Mental Health Project, a longitudinal first-admission study of psychosis. Between 1989 and 1995, individuals with first-admission psychosis were recruited from the 12 inpatient facilities in Suffolk County, New York (response rate 72%). The Stony Brook University Committee on Research Involving Human Subjects and the review boards of participating hospitals approved the protocol annually. Written consent was obtained from all study participants or their parents, for those who were minors at baseline. Eligibility criteria included residence in Suffolk County, age between 15 and 60, ability to speak English, IQ > 70, first admission within the past 6 months, current psychosis, and no apparent medical etiology for psychotic symptoms.

A total of 628 participants met inclusion criteria. Follow-up interviews were conducted at 6 months, 24 months, 48 months, 10 years, and 20 years after baseline. Ninety-two participants died during the follow-up period. Of those surviving, 373 were interviewed at the 20-year follow-up18. Of those 373, DNA was collected from 249 participants as part of the Genomic Psychiatry Cohort19, and is accessible through dbGaP.

At the 20-year point, a comparison group of 261 (205 provided DNA) never psychotic adults was recruited using random digit dialing in zip codes where members of the psychosis cohort resided20. Rate of participation in this group was 67%. The comparison group was sex- and age-matched to the psychosis cohort. Table 1 reports demographics for both the psychosis and never psychotic cohorts.

Table 1 Demographic characteristics Full size table

For genetic assays, DNA was extracted from peripheral lymphocytes and genotyped using the Illumina PsychArray-8 platform containing 571,054 markers. Standard quality control procedures were performed to exclude SNPs with minor allele frequency (MAF) < 1%, genotyping failure > 5%, Hardy–Weinberg equilibrium p < 10−6, mismatch between recorded and genotyped sex, as well as related individuals (π̂ > 0.20, in which case the relative with the lower call rate was dropped). Mean call rate was 99.8%. SNP imputation was conducted with IMPUTE221, against the full 1000 Genomes phase 3 reference panel22. The imputed SNPs underwent another round of quality control and SNPs with missing data > 5% and imputation information score < 0.8 were excluded, yielding 6.87 M high-quality biallelic SNPs. Genomic data analysis was performed using the SVS software, version 8.7.023.

The PRS was calculated for each participant in the sample as the weighted sum of the risk allele they carried, based on the summary statistics (effect alleles and odds ratios) derived from the clumped PGC-2 GWAS results, which consists of 102,636 SNPs2. The clumped PGC GWAS summary statistics file was downloaded from the LD Hub at the Broad Institute (http://ldsc.broadinstitute.org/ldhub/). SNPs were clumped to a more significant SNP if they were in LD (r2 ≥ 0.10) within a 500 kb window. PRS calculation was carried out in the PRSice software24. The SZ PRS was calculated at several p-value thresholds (p ≤ 5 × 10−8, 0.001, 0.01, 0.05, 0.10, 0.20, and 0.50). Mean PRS scores at these thresholds for each diagnostic group are reported in Supplementary Table 1. The results presented here utilize PRS scores based on SNPs with a p-value < 0.01, as this threshold provided the greatest separation between the diagnostic and never psychotic groups. However, analyses scores based on other thresholds yielded similar results (available upon request).

Measures

Diagnosis

Research diagnoses were made by the consensus of study psychiatrists at baseline and again at year 20 using all available longitudinal information, including results of the SCID25, interviews with participants’ significant others, medical records, and observations and behavioral ratings by masters-level interviewers. Diagnoses were made according to DSM criteria. The diagnostic process is described by Bromet11.

Symptoms

Symptom domains were rated using the Scale for the Assessment of Positive Symptoms (SAPS;26), and the Scale for the Assessment of Negative Symptoms (SANS;27). Masters-level mental health professionals made ratings of symptoms based on their interview of the participant, interviews with significant others, and medical records. SAPS and SANS ratings were used to score four factor-analytically derived subscales developed in a prior publication28. The subscales were composites of nonoverlapping items with highest loadings on a given factor. Internal consistency of subscales was adequate: reality distortion (delusions/hallucinations; Cronbach’s α = 0.85) and disorganization (α = 0.77) from the SAPS; avolition (α = 0.87) and inexpressivity (α = 0.90) from the SANS28. Sample sizes at each time point were as follows: 6 months (n = 217); 24 months (n = 209); 48 months (n = 192); 10 years (n = 234); and 20 years (n = 246). Depression in the past month was rated via the Structured Clinical Interview for DSM-III-R at baseline25 and DSM-IV thereafter29, administered without skip-outs and scored as a sum of nine symptom ratings. The Global Assessment of Functioning (GAF) was used to rate overall illness severity (symptoms plus functional impairment) by the consensus of study psychiatrists using all available information.

Cognition

The neuropsychological battery at 24-month and 20-year follow-up included Verbal Paired Associates and Visual Reconstruction (WMS-R, immediate and delay trials;30), Symbol-Digit Modalities (WAIS-III;31), Trails A and B32, the Controlled Oral Word Association Test33, Vocabulary (WRAT-3;34), and the Stroop Test35. Sample sizes for the first-admission cohort were n = 201 at 24 months and n = 224 at 20 years.

Analyses

Code for all analyses is available from the corresponding author on request.

Attrition

Supplemental Table 2 describes the sample sizes and means available for each outcome measure at each time point, as well as contrasts between those who did and did not provide DNA. Those who provided DNA were slightly younger (Cohen’s d = −0.18, p < 0.05), more likely to be prescribed antipsychotic medications at 48 months (Cramer’s V = 0.17, p < 0.05), 10 years (Cramer’s V = 0.10, p < 0.05), and 20 years (Cohen’s d = 0.17, p < 0.05), had higher ratings on SANS avolition at baseline (Cohen’s d = 0.18; p < 0.05), and lower ratings at 48 months (d = −0.23; p < 0.05), and had better scores on the COWAT (d = 0.60, p < 0.05). All analyses used full information maximum likelihood estimation, which uses all data, including partial cases, to arrive at unbiased parameter estimates.

Baseline associations

The baseline time point was qualitatively different from follow-up time points, as all participants were selected because they were actively psychotic. For this reason, baseline symptoms and trajectories from 6 months to 20 years were analyzed separately. Associations of the SZ PRS with baseline symptoms and illness severity were tested via linear regression adjusted for age.

Trajectories

Multilevel spline regression models were used to estimate trajectories of symptoms and illness severity. To allow for nonlinear trajectories, we identified the point at which the average trajectory changed direction. The placement of the change point was determined by alternatively placing it at each 1-year interval from baseline to 20-year follow-up, and comparing the fit of these competing models via the Bayesian Information Criterion (BIC).

Cognitive change

Parallel analysis was used to determine how many factors were reflected in the set of cognitive tests administered at the 24-month and 20-year follow-ups. In both cases, parallel analysis indicated a single cognitive factor. One-factor confirmatory measurement models were fit to the cognitive tests at each time point. In both the 24-month and 20-year models, all test loadings on the general factor were >0.3 and statistically significant. Residual covariance terms were included between subscales of tests with more than one subscale. Model fit was excellent at both 24 months (CFI = 0.94; RMSEA = 0.04) and 20 years (CFI = 1.00; RMSEA = 0.01). The latent cognition factor was regressed on the PRS and age.

Diagnosis and diagnostic shifts

Supplemental Table 3 reports Cohen’s d and R2 of the SZ PRS scores for diagnostic groups at baseline and 20-year follow-up relative to never psychotic adults. R2 is Lee’s37 coefficient of determination on the liability scale, corrected for case–control ascertainment. Baseline and 20-year diagnoses were dichotomized into affective psychosis and non-affective psychosis. The affective psychosis (AP) category included psychotic bipolar disorder and psychotic major depression. The non-affective psychosis (NAP) category included schizophrenia, schizoaffective disorder, substance-induced psychosis, and other psychoses. This categorization was made based on the findings reported as described by Kotov36, which showed that patients who experienced 10 or more days of psychosis outside of a mood episode had worse outcomes at 10-year follow-up. All participants with schizophrenia as well as large majority of those with substance-induced psychosis and other psychoses met this criterion, and were therefore included in the NAP group. This grouping was consistent with similarity of PRS scores among specific diagnoses.

Predictive modeling of shifts from AP to NAP between baseline and 20 years was performed by regressing diagnostic shift groups on statistically significant clinical predictors from Table 3 of Bromet11. Jackknife resampling and leave-one-out cross-validation were used to calculate the stability of model estimates and prediction error, respectively. Sensitivity analyses were performed, excluding patients with a diagnosis of substance-induced psychosis or other psychoses at either baseline of 20-year follow-up, in order to confirm that results were not driven by these potentially ambiguous cases. Results from these analyses are reported under “Sensitivity Analyses” in Supplementary Material.

Population stratification

In order to control for population stratification due to ancestry, all analyses were covaried on the first ten principal components of genetic covariance38. Because the PGC-2 SZ PRS was calibrated in largely European ancestry samples, it is less accurate in non-European samples39. For this reason, we completed sensitivity analyses on the subsample of participants within three standard deviations of the mean on the first four principal components of genetic covariance (n = 235). These results are reported in Supplementary Table 4 and under “Sensitivity Analyses” in Supplementary Material.

Multiple comparisons

To limit the number of Type I errors, we employed Benjamini and Hochberg’s procedure for controlling the false-discovery rate at q = 0.1040. Among the 46 contrasts completed in these analyses, all reported p-values remain significant after FDR, with one exception noted in Footnote 2.