We obtained genome-wide genotype data from which we constructed 49 ancestry matched, non-overlapping case-control samples (46 of European and three of east Asian ancestry, 34,241 cases and 45,604 controls) and 3 family-based samples of European ancestry (1,235 parent affected-offspring trios) (Supplementary Table 1 and Supplementary Methods). These comprise the primary PGC GWAS data set. We processed the genotypes from all studies using unified quality control procedures followed by imputation of SNPs and insertion-deletions using the 1000 Genomes Project reference panel25. In each sample, association testing was conducted using imputed marker dosages and principal components (PCs) to control for population stratification. The results were combined using an inverse-variance weighted fixed effects model26. After quality control (imputation INFO score ≥ 0.6, MAF ≥ 0.01, and successfully imputed in ≥ 20 samples), we considered around 9.5 million variants. The results are summarized in Fig. 1. To enable acquisition of large samples, some groups ascertained cases via clinician diagnosis rather than a research-based assessment and provided evidence of the validity of this approach (Supplementary Information)11,13. Post hoc analyses revealed the pattern of effect sizes for associated loci was similar across different assessment methods and modes of ascertainment (Extended Data Fig. 1), supporting our a priori decision to include samples of this nature.

Figure 1: Manhattan plot showing schizophrenia associations. Manhattan plot of the discovery genome-wide association meta-analysis of 49 case control samples (34,241 cases and 45,604 controls) and 3 family based association studies (1,235 parent affected-offspring trios). The x axis is chromosomal position and the y axis is the significance (–log 10 P; 2-tailed) of association derived by logistic regression. The red line shows the genome-wide significance level (5 × 10−8). SNPs in green are in linkage disequilibrium with the index SNPs (diamonds) which represent independent genome-wide significant associations. PowerPoint slide Full size image

For the subset of linkage-disequilibrium-independent single nucleotide polymorphisms (SNPs) with P < 1 × 10−6 in the meta-analysis, we next obtained results from deCODE genetics (1,513 cases and 66,236 controls of European ancestry). We define linkage-disequilibrium-independent SNPs as those with low linkage disequilibrium (r2 < 0.1) to a more significantly associated SNP within a 500-kb window. Given high linkage disequilibrium in the extended major histocompatibility complex (MHC) region spans ∼8 Mb, we conservatively include only a single MHC SNP to represent this locus. The deCODE data were then combined with those from the primary GWAS to give a data set of 36,989 cases and 113,075 controls. In this final analysis, 128 linkage-disequilibrium-independent SNPs exceeded genome-wide significance (P ≤ 5 × 10−8) (Supplementary Table 2).

As in meta-analyses of other complex traits which identified large numbers of common risk variants27,28, the test statistic distribution from our GWAS deviates from the null (Extended Data Fig. 2). This is consistent with the previously documented polygenic contribution to schizophrenia10,11. The deviation in the test statistics from the null (λ GC = 1.47, λ 1000 = 1.01) is only slightly less than expected (λ GC = 1.56) under a polygenic model given fully informative genotypes, the current sample size, and the lifetime risk and heritability of schizophrenia29.

Additional lines of evidence allow us to conclude the deviation between the observed and null distributions in our primary GWAS indicates a true polygenic contribution to schizophrenia. First, applying a novel method30 that uses linkage disequilibrium information to distinguish between the major potential sources of test statistic inflation, we found our results are consistent with polygenic architecture but not population stratification (Extended Data Fig. 3). Second, the schizophrenia-associated alleles at 78% of 234 linkage-disequilibrium-independent SNPs exceeding P < 1 × 10−6 in the case-control GWAS were again overrepresented in cases in the independent samples from deCODE. This degree of consistency between the case-control GWAS and the replication data is highly unlikely to occur by chance (P = 6 × 10−19). The tested alleles surpassed the P < 10−6 threshold in our GWAS before we added either the trios or deCODE data to the meta-analysis. This trend test is therefore independent of the primary case-control GWAS. Third, analysing the 1,235 parent-proband trios, we again found excess transmission of the schizophrenia-associated allele at 69% of the 263 linkage-disequilibrium-independent SNPs with P < 1 × 10−6 in the case-control GWAS. This is again unlikely to occur by chance (P = 1 × 10−9) and additionally excludes population stratification as fully explaining the associations reaching our threshold for seeking replication. Fourth, we used the trios trend data to estimate the expected proportion of true associations at P < 1 × 10−6 in the discovery GWAS, allowing for the fact that half of the index SNPs are expected to show the same allelic trend in the trios by chance, and that some true associations will show opposite trends given the limited number of trio samples (Supplementary Methods). Given the observed trend test results, around 67% (95% confidence interval: 64–73%) or n = 176 of the associations in the scan at P < 1 × 10−6 are expected to be true, and therefore the number of associations that will ultimately be validated from this set of SNPs will be considerably more than those that now meet genome-wide significance. Taken together, these analyses indicate that the observed deviation of test statistics from the null primarily represents polygenic association signal and the considerable excess of associations at the tail of extreme significance largely correspond to true associations.

Independently associated SNPs do not translate to well-bounded chromosomal regions. Nevertheless, it is useful to define physical boundaries for the SNP associations to identify candidate risk genes. We defined an associated locus as the physical region containing all SNPs correlated at r2 > 0.6 with each of the 128 index SNPs. Associated loci within 250 kb of each other were merged. This resulted in 108 physically distinct associated loci, 83 of which have not been previously implicated in schizophrenia and therefore harbour potential new biological insights into disease aetiology (Supplementary Table 3; regional plots in Supplementary Fig. 1). The significant regions include all but 5 loci previously reported to be genome-wide significant in large samples (Supplementary Table 3).