Samples

Summary statistics were obtained from GWAS meta-analyses of intelligence (n = 78,308) [16], and education (n = 329,417) [25]. To maximise sample size in our intelligence dataset, four additional GWASs were performed on the verbal-numerical reasoning (VNR) test in UK Biobank. The VNR test consists of 13 items, 6 verbal and 7 numerical questions, all of which are multiple choice. An individual’s verbal numerical reasoning score was measured by summing the number of correct responses given within a 2 minute time period. Participants performed the VNR test either online or at a UK Biobank assessment centre, with some participants taking the VNR test at multiple time points. If participants took the VNR test at multiple time points, only the earliest was used, leading to four GWASs being performed on VNR (time 1, N = 76,051, time 2, N = 9266, time 3, N = 2552, online, N = 33,065). UK Biobank participants who were included in the GWAS from Sniekers et al. [16]. were omitted from the current GWASs. Following quality control, a total of 120,934 new UK Biobank participants were available for GWAS. Ethical approval for UK Biobank was received from the Research Ethics Committee (REC reference 11/NW/0382). This work was conducted under UK Biobank application 10279.

In order to derive genetic correlations between intelligence (and the proxy phenotype of education, as well as the final meta-analytic sample) and health-related and other traits, we used summary statistics from 29 GWAS datasets. Supplementary Table 1 shows the datasets used and provides a reference and sample size for each dataset used.

UK biobank genotyping

Full details of the UK Biobank genotyping procedure are available elsewhere [30]. Briefly, two custom genotyping arrays were used to genotype 49,950 participants (UK BiLEVE Axiom Array) and 438,427 participants (UK Biobank Axiom Array) [30]. Genotype data on 805,426 markers were available for 488,377 of the individuals in UK Biobank. Imputation was carried out using a combination of the Haplotype Reference Consortium (HRC) reference panel, 1000 genomes, and UK10k. Here, we restrict the analysis to the HRC panel, as advised by UK Biobank. This led to 39,131,578 autosomal SNPs being available for the 120,934 participants who had taken the VNR test [30]. Allele frequency checks [31] were performed against the HRC [32] and 1000G [33] site lists, and variants were removed if the allele frequencies differed from the reference set by more than ±0.2.

Additional quality control steps were implemented in the present study and included the removal of participants with non-British ancestry (identified by Bycroft et al. [30]. by performing a principal component analysis on the genotyped SNP data to remove ethnic outliers from a subset of the UK Biobank participants who self-identified as White British) as well as those with extreme scores based on heterozygosity (extreme scores were defined as those with a principal component-adjusted heterozygosity score above 0.19 as shown by Bycroft et al. [30].) and >5% missingness. Individuals whose reported sex was inconsistent with genetically inferred sex were also removed, as well as individuals with neither XX nor XY chromosomes. Finally, those individuals with >10 putative third degree relatives, identified by Bycroft et al. [30] by estimating the kinship coefficients for all pairs of samples using the software KING [34], were removed. This left 408,095 individuals. Using GCTA [9] on 131,790 reportedly-related participants one from each pair of related individuals was removed, based on a genetic relationship threshold of 0.025, leaving 332,050 individuals. Finally, individuals whose genetic and VNR data were available for analysis in the first wave of genetic data release from UK Biobank were removed as these individuals were already a part of the Sniekers et al. [16] dataset. Following these quality control steps, a sample size of 120,934 individuals was available for the VNR test. SNPs with a minor allele frequency (MAF) < 0.0005, and an imputation quality score < 0.1 were removed along with non-bi-allelic SNPs, resulting in 18,485,882 autosomal SNPs.

Statistical analysis

Association analysis

VNR was analysed separately at one of four time points: three of these were at an assessment centre, VNR 1, VNR 2, VNR MRI, and one was online, VNR Online. All VNR scores were adjusted for age, sex, assessment centre, genotype batch, array, and 40 principal components. Association analysis was performed using an additive model implemented using BGENIE [30].

Meta-analysis

MTAG results are susceptible to bias and a large false discovery rate when analyzing sets of GWAS summary statistics where some sets are much more highly-powered than others [26]. In order to improve the statistical power to detect association in the Sniekers [16] data, we first meta-analysed the Sniekers dataset with the four GWASs performed on UK Biobank’s VNR test. This lead to the inclusion of 120,934 new participants. The summary statistics from the four UK Biobank VNR GWASs were meta-analysed with the summary statistics available from Sniekers et al. [16] using a sample size weighted meta-analysis conducted with METAL [35]. This resulted in an intelligence dataset containing 199,242 participants.

Multi-trait analysis of genome-wide association studies (MTAG)

MTAG [26] allows the meta-analysis of different traits that are genetically correlated with each other in order to increase power to detect loci in any one of the traits. Only summary data are required in order to carry out MTAG and, as bivariate LD score regression is carried out as part of an MTAG analysis to account for (possibly unknown) sample overlap between the GWAS results, these summary statistics need not come from independent samples. Our goal was to increase the power to detect loci associated with intelligence, and so the meta-analytic results of the GWAS on intelligence by Sniekers et al. [16] and the new participants from UK Biobank were used as our primary GWAS data sets. In order to add power to this combined intelligence dataset, the genetically-correlated proxy phenotype of years of education [25] (n = 329,417) was included. MTAG was run using the default settings.

Identification of independent genomic loci and functional annotation

Using the meta-analytic dataset produced by MTAG, genetic loci related to intelligence were identified using FUMA [36]. First, independent significant SNPs were identified. Independent significant SNPs were selected on the basis of their P-value being genome wide significant (P < 5 × 10−8) and being independent from each other (r2 < 0.6) within a 1 mb window. Secondly, SNPs that were in LD of the independent lead SNPs (r2 ≥ 0.6) within a 1 mb window, and within 1000 genomes reference panel with a MAF of greater than 0.01 were included for further annotation. Thirdly, lead SNPs were identified as a subset of the independent significant SNPs (defined as above). Lead SNPs were defined as independent significant SNPs that were in LD with each other at r2 < 0.1, again with a 1 mb window. Fourthly, genomic risk loci were identified by merging lead SNPs if they were closer than 250 kb apart, meaning that a genomic risk locus could contain multiple independent significant SNPs and multiple lead SNPs. Finally, all SNPs in LD of r2 ≥ 0.6 with one of the independent significant SNPs formed the border or edge of the genomic risk loci. To map LD, the 1000 genomes phase 3 was used [33].

Functional annotation was carried out in FUMA [36] using all SNPs found within the independent genomic loci which were in LD of r2 ≥ 0.6, were nominally significant, and had a MAF of 0.01. To gauge the functional consequences of genetic variation at these SNPs they were first matched based on chromosome, base pair position, reference, and non-reference alleles to a database containing functional annotations including the ANNOVAR categories [37], combined annotation dependent depletion (CADD) scores [38], Regulome DB (RDB) scores [39], and chromatin states [40,41,42].

The ANNOVAR [37] categories were used to identify the function of the SNP, and to locate its position within the genome. CADD scores are a continuous measurement used to determine how deleterious genetic variation at the SNP is to protein structure and function. Higher scores are indicative of a more deleterious variant, with scores of greater than 12.37 providing evidence of pathogenicity [38]. A Regulome DB score is a categorical measurement based on data from expression quantitative trait loci (eQTLs) as well as chromatin marks. The RDB score ranges from 1a to 7 with lower scores given to the variants with the greatest evidence for having regulatory function.

Chromatin states indicate the level of accessibility of genomic regions. This level of accessibility was described using a 15 point scale predicated for each variant using a hidden Markov model based on five chromatin marks for 127 epigenomes in the Roadmap Epigenomics Project [41]. The lower the chromatin score the greater the level of accessibility to the genome at this site, with scores of less than 8 indicative of an open chromatin region. The minimum chromatic state across tissues was used.

Gene-based GWAS

Gene-based analysis was conducted using multi-marker analysis of genomic annotation (MAGMA) [43]. SNPs that were located within protein coding genes were used to derive a P-value describing the association found with intelligence. Gene locations and boundaries were used from the NCBI build 37 and LD was controlled for using the 1000 genomes phase 3 release [44]. A Bonferroni correction was applied to control for the multiple tests performed on the 18,199 autosomal genes available for analysis.

Tissue type gene expression

In order to identify the importance of particular tissue types relevant to individual differences in intelligence, a gene property analysis was conducted using MAGMA. The goal of this analysis was to determine if, in 30 broad tissue types, and 53 specific tissues, tissue-specific differential expression levels were predictive of the association of a gene with intelligence. Tissue types were taken from the GTEx v6 RNA-seq database [45] with expression values being log2 transformed with a pseudocount of 1 after winsorising at 50 with the average expression value being taken from each tissue. Multiple testing was controlled for using Bonferroni correction for 30, and 53 tests.

Gene-set analysis

Gene-set analysis was conducted using MAGMA [43] using competitive testing. A total of 10,891 gene-sets (sourced from Gene Ontology [27], Reactome [28], and, MSigDB [29]) were examined for enrichment in intelligence. A Bonferroni correction was applied to control for the multiple tests performed on the 10,891 gene sets available for analysis.

Genetic correlations

In order to address whether the genetic architecture of the meta-analysis of correlated traits conducted here produced a phenotype with the same genetic architecture as intelligence, we derived genetic correlations across 29 cognitive, socio-economic status (SES), mental health, metabolic, anthropometric, reproductive, and health and wellbeing phenotypes. We used linkage disequilibrium score regression [12] to test whether each dataset had sufficient evidence of a polygenic signal, indicated by a heritability Z-score [12] of >4 and a mean χ2 statistic of >1.02 [12]. A minor allele frequency cut-off of <0.01 was applied. Only SNPs that were in HapMap 3 with MAF > 0.05 in the 1000 Genomes EUR reference sample were included. Next, Indels and structural variants were removed as were strand ambiguous variants. SNPs whose alleles did not match those in the 1000 Genomes were also removed. The presence of outliers can increase the standard error in LD score regression [12], and so SNPs where χ2 > 80 were removed. LD scores and weights for use with European populations were downloaded from (http://www.broadinstitute.org/~bulik/eur_ldscores/). False discovery rate (FDR) was controlled for using Benjamini-Hochberg [46] procedure to control for the 30 tests performed (Alzheimer’s disease was included twice) against each phenotype. The corrected alpha level corresponding to an FDR of 5% was 0.0374 for education, 0.0305 for intelligence in the Sniekers et al. [16]. dataset, and 0.0168 for the final meta-analytic dataset on intelligence [46]. In the case of Alzheimer’s disease, a region encompassing 500 kb on each side of APOE was removed and the analysis re-run in order to ensure that the large effects in this region did not bias the regression.

Partitioned heritability

Partitioned heritability was carried out using stratified linkage disequilibrium score regression [47]. The goal of the partitioned heritability analysis was to determine if SNPs that explain variance in intelligence cluster in functional regions of the genome. A full description of how this method works can be found in Finucane et al. [47]. Firstly, heritability for each of the functional groupings is derived. Secondly, this heritability estimate is used to derive an enrichment metric defined as the proportion (Pr) of heritability captured by the functional annotation, over the proportion of SNPs contained within it (Pr(h2)/Pr(SNPs)). This ratio describes whether a functional annotation contains a greater or lesser proportion of the heritability than would be predicted by the proportion of SNPs it contains, Pr(h2)/Pr(SNPs) = 1. The proportion of the heritability of each category is used as the numerator, rather than the heritability of each category. Stratified LD Scores were calculated from the European-ancestry samples in the 1000 Genomes project (1000G) and only included the HapMap 3 SNPs with a minor allele frequency (MAF) of >0.05. A model was derived using 52 overlapping, functional categories. Correction for multiple testing was performed using a Bonferroni test on the 52 functional categories (α = 0.00096).

Genetic prediction

The three smaller samples of individuals who performed the VNR test and had their genetic data released in the second release of the UK Biobank genetic data were used. The METAL meta-analysis and MTAG meta-analysis with education were re-run leaving out one of the groups who performed the VNR test. The group that was left out were then used as the target sample for polygenic prediction. Using our meta-analytic dataset on intelligence, polygenic risk scores were derived for intelligence in each of the VNR groups using PRSice [48]. SNPs that were strand ambiguous and those with a MAF of <0.01 were removed prior to deriving the polygenic risk scores. SNPs were clumped using the binary.ped files from the participants in the UK Biobank as a reference (r2 < 0.25, 250 kb window). Polygenic scores were then derived for each participant as the sum of alleles associated with intelligence, weighted by the effect size from our meta-analytic intelligence dataset. A total of five polygenic risk scores were derived using the following P-value cut offs: 0.01, 0.05, 0.1, 0.5, and 1.