Sample description and variant selection

We collected coding sequence variation in 1,042 schizophrenia patients and 961 controls using the Illumina HumanExome BeadChip array. From a total of 100,857 variants observed in our sample, 75,837 are protein-coding SNVs that were scored by ISUB. From these, 16,262 occurred only in cases, of which 6,424 are predicted to be deleterious. For controls, 12,964 were set-unique non-synonymous, with a total of 5,130 predicted to be deleterious (see Fig. 1). The number of observations ranges from 1 to 12 per SNV with a mean of 1.43 (median=1) in cases and a mean of 1.39 (median=1) in controls. The mean MAF of the set-unique variants included in our study is 0.05% in the population based on individuals of European Ancestry included in ESP6500.

Figure 1: Set-unique SNVs. Diagram depicting the set-unique variants (not to scale, cases n=1,002, controls n=931). The set NS represents all variants that are scored by ISUB, and the set DEL includes only those variants that are predicted to be deleterious. In particular, the set DEL includes stop-altering and splice site variants. Full size image

Increased burden of rare variants in schizophrenia

The genome-wide ISUB for deleterious variants is increased significantly in cases versus controls (empirical P=0.018) (see Table 1, Supplementary Fig. 1 and Supplementary Table 1). Although a similar difference is observed for the set of all non-synonymous variants (NS, empirical P=0.033), the signal is driven primarily by deleterious variants (DEL, empirical P=0.018) and not by the complementary set of non-synonymous variants predicted to be non-deleterious (NS—DEL, empirical P=0.492), as indicated by the information in Table 1 (Methods). Given the differences in rare variants between cases and controls, we investigate the nature, robustness and polygenicity of this observed increase. To illustrate, the increased burden may be caused by a larger number of variants per individual or by an increased predicted deleteriousness per SNV, or both. To this end, we characterized the quantitative/qualitative nature of the observed increase. Although not completely independent, our results suggest that the increased individual burden is primarily due to a larger number of rare deleterious SNVs (empirical P=0.015), rather than the SNVs in patients being more deleterious (empirical P=0.079, Supplementary Table 1).

Table 1 Individual set-unique burden (ISUB) analysis. Full size table

In an independent replication sample of 5,585 schizophrenia cases and 8,103 controls of European ancestry we observe an increased frequency of the deleterious variants identified through ISUB: cases have an average of 3.4 of the total of 6,424 deleterious variants observed only in cases, compared with 3.3 in controls (empirical P=0.035).

These results are unlikely to represent an artefact caused by population stratification or cryptic relatedness for several reasons: (1) we performed stringent quality control (Supplementary Methods, and Supplementary Fig. 2); (2) ISUB score is a significant predictor of case–control status, even after including the first ten MDS components into the model (Supplementary Methods); and (3), we observe the same increased frequency in two independent samples of European Ancestry.

Polygenic nature of the increased burden

To assess the polygenicity of the observed increase in burden as well as identify the contributions of the different types of variants, we next analysed the number of genes affected by the different types of variants. The variants we tested are (a) all deleterious variants, (b) splice site variants, (c) stop-altering variants and (d) so-called ‘double hits’. The latter includes genes with two or more non-synonymous SNVs within one individual, where we require at least one to be predicted to be deleterious. As summarized in Table 2, a greater number of genes are affected by deleterious rare variants in cases versus controls (4,533 genes in cases versus 3,795 in controls, empirical P=0.008, Table 2). More striking is that in cases an increased number of genes is affected by splice site variants (380 genes in cases versus 276 in controls, empirical P=0.016) as well as stop-altering stop-altering variants (350 and 253 genes respectively empirical P=0.004). We observe a trend in the number of genes with two or more non-synonymous SNVs (179 genes in cases, 103 in controls, empirical P=0.047), but this difference does not survive correction for multiple testing. To further investigate the polygenicity of rare variant burden, we excluded the variants in a set of genes shown in previous work to be most prominently enriched for rare disruptive mutations in schizophrenia (n=1,796)6. In this analysis, the observed difference in deleterious ISUB remains significant (empirical P=0.006) (Supplementary Methods).

Table 2 Set-unique burden of different variant types. Full size table

Functional classification of rare deleterious variants

To classify the genes implicated by ISUB analysis at a functional level, we examined tissue expression enrichment and pathway analysis using DAVID (database for annotation, visualization and integrated discovery)8. To restrict the total set of genes to a number suitable for DAVID analysis (n<3,000), we selected the genes with increased load of rare variants based on Fisher exact P-values (see Methods and Supplementary Methods for the precise definition of these genes with strongest evidence). This approach not only reduces the set of genes to a usable size (from 4,533 to 698 in cases), it also reduces noise from highly polymorphic genes with large number of non-synonymous variants seen in both cases and controls (for example, NEB in which 12 controls and 9 cases have at least one deleterious SNV, and MUC16 with 6 controls and 12 cases9). Interestingly, we observed significant enrichment for genes expressed in fetal brain (uncorrected P=0.001, Benjamini corrected P=0.033, hypergeometric overlap test). To control for potential confounders such as gene length, we performed the same analysis in controls, and while some enrichments were observed, fetal brain genes were not significantly enriched (Supplementary Table 2). Pathway analysis points to the extracellular matrix (ECM) receptor interaction as the only significantly enriched pathway in cases (P=5.87E−05, Benjamini corrected P=8.30E−03, hypergeometric overlap test), whereas no pathway enrichment was observed in controls. The above results were confirmed by a permutation analysis sampling an equal number of cases and controls (931 individuals for both groups, Supplementary Table 3, Supplementary Methods).

Relationship between common and rare susceptibility alleles

Given the increased individual burden of rare coding variants in schizophrenia, we next examined the relationship between common and rare susceptibility alleles. We tested the overlap between genes located in schizophrenia associated intervals from the GWAS study of the Psychiatric Genomics Consortium1 (ED Supplementary Table 3) and the genes implicated in our study. No significant overlap was observed in cases (83 overlap the 4,533 genes with deleterious alleles in cases versus 62/3,795 in controls, empirical P=0.070, Supplementary Table 2, Supplementary Methods). We also did not observe a correlation between ISUB score and sex (Supplementary Methods).

Quantifying the contribution of these variants to disease susceptibility, by odds ratios and similar statistics, is not suitable for low frequency alleles. For this reason, we measured the relative effect size by reduction in Nagelkerke’s R2 from logistic regression, and compared it with the relative impact of common SNVs using polygenic risk scores of the most recent GWAS results1. These regressions showed relative effect sizes of 10.7% for GWAS common alleles compared with 0.6% for rare deleterious alleles identified through ISUB (Supplementary Methods). Thus, common alleles contribute an order-of-magnitude more to disease susceptibility than the rare deleterious variants, in line with evidence presented by Purcell, et al.6. When comparing the polygenic risk score of cases with ISUB scores directly, no correlation was observed (Supplementary Methods). This suggests that common and rare disease risk alleles may be independently and additively enriched in cases versus controls.

Finally, we tested the overlap between our gene sets and the set of genes containing previously reported de novo mutations in schizophrenia2,3,10,11. While the difference between cases and controls is not significant (empirical P=0.070), we observe a highly significant overlap within each group (P<2.2e−16 for cases and P=4.44e−16 for controls, Supplementary Methods, hypergeometric overlap test).