Participants and cognitive phenotypes

The present study includes 300,486 individuals of European ancestry from 57 population-based cohorts brought together by the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE), the Cognitive Genomics Consortium (COGENT) consortia, and UK Biobank (Supplementary Note 2). All individuals were aged between 16 and 102 years. Exclusion criteria included clinical stroke (including self-reported stroke) or prevalent dementia (Supplementary Data 18).

General cognitive function, unlike height for example, is not measured the same way in all samples. Here, this was mitigated by applying a consistent method of extracting a general cognitive function component from cognitive test data in the cohorts of the CHARGE and COGENT consortia; all individuals were of European ancestry (Supplementary Note 1).

For each of the CHARGE and COGENT cohorts, a general cognitive function component phenotype was constructed from a number of cognitive tasks. Each cohort was required to have tasks that tested at least three different cognitive domains. We avoided taking more than one cognitive test score from any individual cognitive test. Principal component analysis was applied to the cognitive test scores to derive a measure of general cognitive function. Principal component analyses results for the CHARGE cohorts were checked by one author (IJD) to establish the presence of a single component. The scree slope was examined, the percentage of variance accounted for by the first unrotated principal component was noted, and it was checked that all tests had sufficient loading on the first unrotated principal component. Scores on the first unrotated component were used as the cognitive phenotype (general cognitive function). Principal component analyses for the COGENT cohorts are described in Trampush et al. (pp. 337–338, and Supplementary Table 1)64.

UK Biobank participants were asked 13 multiple-choice questions that assessed verbal and numerical reasoning (VNR: UK Biobank calls this the ‘fluid’ cognitive test). The VNR score was the number of questions answered correctly in 2 min. Four samples of UK Biobank participants with verbal-numerical reasoning scores were used in the current analyses. The first sample (VNR Assessment Centre) consists of UK Biobank participants who completed the verbal-numerical reasoning test at baseline in assessment centres (n = 107,586). The second UK Biobank sample (VNR T2) consists of participants who did not complete the verbal-numerical reasoning test at baseline but did complete this test at the first repeat assessment visit in assessment centres (n = 11,123). The third UK Biobank sample (VNR MRI) consists of participants who did not complete the verbal-numerical reasoning test at a previous testing occasion but did complete the test at the imaging visit in assessment centres (n = 3002). The fourth UK Biobank sample (VNR Web-Based) consists of participants who did not complete the verbal-numerical reasoning test at any assessment centre visit, but did complete this test during the web-based cognitive assessment online (n = 46,322). Details of the cognitive phenotypes for all cohorts can be found in Supplementary Note 1.

At the baseline UK Biobank assessment, 496,790 participants completed the reaction time test. Details of the test can be found in Supplementary Note 1. A sample of 330,069 UK Biobank participants with scores on both the reaction time test and genotyping data was used in this study.

Genome-wide association analyses

Genotype–phenotype association analyses were performed within each cohort, using an additive model, on imputed SNP dosage scores. Adjustments for age, sex, and population stratification were included in the model for each cohort. Cohort-specific covariates—for example, site or familial relationships—were also fitted as required. Cohort-specific quality control procedures, imputation methods, and covariates are described in Supplementary Data 19. Quality control of the cohort-level summary statistics was performed using the EasyQC software65, which implemented the exclusion of SNPs with imputation quality <0.6 and minor allele count <25.

General cognitive function meta-analysis

A meta-analysis including all the CHARGE-COGENT and UK Biobank summary results was performed using the METAL package with a sample-size weighted model implemented (http://www.sph.umich.edu/csg/abecasis/Metal).

Reaction time genome-wide association analysis

The GWAS of reaction time from the UK Biobank sample was performed using the BGENIE v1.2 analysis package (https://jmarchini.org/bgenie/). A linear SNP association model was tested which accounted for genotype uncertainty. Reaction time was adjusted for the following covariates: age, sex, genotyping batch, genotyping array, assessment centre, and 40 principal components.

Genomic risk loci characterization using FUMA

Genomic risk loci were defined from the SNP-based association results, using FUnctional Mapping and Annotation of genetic associations (FUMA)23. Firstly, independent significant SNPs were identified using the SNP2GENE function and defined as SNPs with a P-value of ≤5 × 10−8 and independent of other genome wide significant SNPs at r2 < 0.6. Using these independent significant SNPs, tagged SNPs to be used in subsequent annotations were identified as all SNPs that had a MAF ≥ 0.0005 and were in LD of r2 ≥ 0.6 with at least one of the independent significant SNPs. These tagged SNPs included those from the 1000 genomes reference panel and need not have been included in the GWAS performed in the current study. Genomic risk loci that were 250 kb or closer were merged into a single locus. Lead SNPs were also identified using the independent significant SNPs and were defined as those that were independent from each other at r2 < 0.1.

Comparison with previous findings

Previous evidence of association for each of the 148 genetic loci identified herein as being associated with general cognitive function was sought in the largest published GWASs of general cognitive function16,17 and education24. We performed look-ups on all tagged SNPs (r2 > 0.6) within each locus, including all 1000 genomes SNPs, and classed any tagged SNP previously reported as genome-wide significant, as replication. Details of these findings are presented in Supplementary Data 3.

Gene-based analysis implemented in FUMA

Gene-based analysis has been shown to increase the power to detect genotype-phenotype association because the multiple testing burden is reduced, and the effect of multiple SNPs is combined together66. Gene-based analysis was conducted using MAGMA67. The test carried out using MAGMA, as implemented in FUMA, was the default SNP-wise test using the mean χ2 statistic derived on a per gene basis. SNPs were mapped to genes based on genomic location. All SNPs that were located within the gene-body were used to derive a P-value describing the association found with general cognitive function and reaction time. The SNP-wise model from MAGMA was used and the NCBI build 37 was used to determine the location and boundaries of 18,199 autosomal genes. Linkage disequilibrium within and between each gene was gauged using the 1000 genomes phase 3 release68. A Bonferroni correction was applied to control for multiple testing; the genome-wide significance threshold was P < 2.75 × 10−6.

Estimation of SNP-based heritability

The proportion of variance explained by all common SNPs was estimated using univariate GCTA-GREML analyses69 in four of the largest individual cohorts: ELSA, Understanding Society, UK Biobank, and Generation Scotland. Sample sizes for all of the GCTA analyses in these cohorts differed from the association analyses, because one individual was excluded from any pair of individuals who had an estimated coefficient of relatedness of >0.025 to ensure that effects due to shared environment were not included. The same covariates were included in all GCTA-GREML analyses as for the SNP-based association analyses.

Univariate Linkage Disequilibrium Score regression

Univariate LDSC regression was performed on the summary statistics from the GWAS on general cognitive function and reaction time. The heritability Z-score provides a measure of the polygenic signal found in each data set. Values greater than four indicate that the data are suitable for use with bivariate LDSC regression70. The mean χ2 statistic indicates the inflation of the GWAS test statistics that, under the null hypothesis of no association (i.e., no inflation of test statistics), would be one. An inflation in the test statistics can indicate population stratification, cryptic relatedness, or the presence of many alleles each with a small effect. The intercept of the LDSC regression can detect the difference between inflation due to stratification and cryptic relatedness, and the inflation due to a polygenic signal. This is because the inflation in test statistics attributable to stratification, drift, and cryptic relatedness will not correlate with LD, whereas inflation due to polygenicity will. The LDSC regression intercept, therefore, captures the inflation in the χ2 statistics that is not due to stratification or other confounds.

For each GWAS, an LD regression was carried out by regressing the GWA test statistics (χ2) on to each SNP’s LD score, which is the sum of squared correlations between the minor allele frequency count of a SNP with the minor allele frequency count of every other SNP. This regression allows for the estimation of heritability from the slope, and a means to detect residual confounders using the intercept. For general cognitive function, we report an LD score regression intercept of 1.058 (SE = 0.011) and a ratio of 0.0659; this indicates that only 6.6% of the inflation observed can be ascribed to causes other than a polygenic signal. For reaction time, we report an LD score regression intercept of 1.02 (SE = 0.009) and a ratio 0.0475; this indicates that only 4.75% of the inflation observed can be ascribed to causes other than a polygenic signal.

LD scores and weights were downloaded from (http://www.broadinstitute.org/~bulik/eur_ldscores/) for use with European populations. A minor allele frequency cut-off of >0.1 and an imputation quality score of >0.9 were applied to the GWAS summary statistics. Following this, SNPs were retained if they were found in HapMap 3 with MAF >0.05 in the 1000 Genomes EUR reference sample. Following this, indels and structural variants were removed along with strand ambiguous variants. SNPs whose alleles did not match those in the 1000 Genomes were also removed. As the presence of outliers can increase the standard error in LDSC score regression70 and so SNPs where χ2 > 80 were also removed.

Genetic correlations

Genetic correlations were estimated using two methods, bivariate GCTA-GREML71 and LDSC70. Bivariate GCTA was used to calculate genetic correlations between phenotypes and cohorts where the genotyping data were available. This method was used to calculate the genetic correlations between different cohorts for the general cognitive function phenotype. It was also employed to investigate the genetic contribution to the stability of the same UK Biobank’s participants’ verbal-numerical reasoning test scores in the assessment centre and then in web-based, online testing. In cases where only GWA summary results were available, bivariate LDSC was used to estimate genetic correlations between two traits. This was used to estimate the degree of overlap between polygenic architecture of the traits. Bivariate LDSC regression was used to estimate genetic correlations between general cognitive function, reaction time, and the following health outcomes: ADHD, age at menarche, age at menopause, Alzheimer's disease, anorexia nervosa, bipolar disorder, BMI, bone density femoral neck, bone density lumbar spine, coronary artery disease, HbA1c, HDL cholesterol, hippocampal volume, intracranial volume, LDL cholesterol, longevity, lung cancer, major depression, neuroticism, schizophrenia, smoking status, triglycerides, type 2 diabetes, waist-hip ratio, autism spectrum disorder, birth weight, depressive symptoms, hypertension, pulse wave arterial stiffness, angina, heart attack, parental longevity, forced expiratory volume in 1-second (FEV1), hand grip strength, happiness, health satisfaction, heel bone mineral density, osteoarthritis, overall health rating, wearing of glasses or contact lenses, long-sightedness, short-sightedness, sleep duration, sleeplessness/insomnia, and subjective wellbeing. For Alzheimer’s disease, a 500-kb region surrounding APOE was excluded and the analysis re-run (Alzheimer’s disease (500 kb)). Supplementary Data 20 provides further details on the sources of the GWAS summary statistics.

Polygenic prediction

Polygenic profile score analyses were used to predict cognitive test performance in Generation Scotland, the English Longitudinal Study of Ageing, and Understanding Society. Polygenic profiles were created in PRSice72 using results of a general cognitive function meta-analysis that excluded the Generation Scotland, the English Longitudinal Study of Ageing, and Understanding Society cohorts. Polygenic profiles were also created in these cohorts based on the UK Biobank GWA reaction time results. SNPs with a MAF < 0.01 were removed prior to creating the polygenic profiles. Clumping was used to obtain SNPs in linkage disequilibrium with an r2 < 0.25 within a 250 kb window. Polygenic profile scores were created at P-value thresholds of 0.01, 0.05, 0.1, 0.5, and 1 (all SNPs), based on the significance of the association in the general cognitive function and reaction time GWAS. Linear regression models were used to examine the associations between the polygenic profile and cognitive ability in GS, ELSA, and US, adjusting for age at measurement, sex, and the first 10 (GS), 15 (ELSA), and 20 (US) genetic principal components to adjust for population stratification. The false discovery rate (FDR) method was used to correct for multiple testing across the polygenic profiles at all five thresholds73.

Functional annotation implemented in FUMA23

The independent significant SNPs and those in LD with the independent significant SNPs were annotated for functional consequences on gene functions using ANNOVAR74 and the Ensembl genes build 85. A CADD score75, RegulomeDB score76, and 15-core chromatin states77,78,79 were obtained for each SNP. eQTL information was obtained from the following databases: GTEx (http://www.gtexportal.org/home/), BRAINEAC (http://www.braineac.org/), Blood eQTL Browser (http://genenetwork.nl/bloodeqtlbrowser/), and BIOS QTL browser (http://genenetwork.nl/biosqtlbrowser/). Functionally-annotated SNPs were then mapped to genes based on physical position on the genome, eQTL associations (all tissues) and chromatin interaction mapping (all tissues). Intergenic SNPs were mapped to the two closest up- and down-stream genes which can result in their being assigned to multiple genes.

Gene-set analysis implemented in FUMA

In order to test whether the polygenic signal measured in each of the GWASs clustered in specific biological pathways, a competitive gene-set analysis was performed. Gene-set analysis was conducted in MAGMA67 using competitive testing, which examines if genes within the gene set are more strongly associated with each of the cognitive phenotypes than other genes. Such competitive tests have been shown to control for Type 1 error rate as well as facilitating an understanding of the underlying biology of cognitive differences80,81. A total of 10,891 gene-sets (sourced from Gene Ontology82, Reactome83, and, SigDB84) were examined for enrichment of general cognitive function and reaction time. A Bonferroni correction was applied to control for the multiple tests performed on the 10,891 gene sets available for analysis.

Gene-property analysis implemented in FUMA

A gene-property analysis was conducted using MAGMA in order to indicate the role of particular tissue types that influence differences in general cognitive function and reaction time. The goal of this analysis was to test if, in 30 broad tissue types and 53 specific tissues, tissue-specific differential expression levels were predictive of the association of a gene with general cognitive function and reaction time. Tissue types were taken from the GTEx v6 RNA-seq database85 with expression values being log2 transformed with a pseudocount of 1 after winsorising at 50, with the average expression value being taken from each tissue. Multiple testing was controlled for using a Bonferroni correction.

Data availability

The GWAS summary results for all significant and suggestive SNPs for general cognitive function and reaction time are available in Supplementary Data 1, 2, 10 and 11. The full GWAS summary results for Reaction Time are available to download here: http://www.ccace.ed.ac.uk/node/335. Access to the full GWAS summary results for general cognitive function can be requested by application to the chairs of the CHARGE and COGENT consortia.