Here, we present a large (n = 107,207) genome-wide association study (GWAS) of general cognitive ability (“g”), further enhanced by combining results with a large-scale GWAS of educational attainment. We identified 70 independent genomic loci associated with general cognitive ability. Results showed significant enrichment for genes causing Mendelian disorders with an intellectual disability phenotype. Competitive pathway analysis implicated the biological processes of neurogenesis and synaptic regulation, as well as the gene targets of two pharmacologic agents: cinnarizine, a T-type calcium channel blocker, and LY97241, a potassium channel inhibitor. Transcriptome-wide and epigenome-wide analysis revealed that the implicated loci were enriched for genes expressed across all brain regions (most strongly in the cerebellum). Enrichment was exclusive to genes expressed in neurons but not oligodendrocytes or astrocytes. Finally, we report genetic correlations between cognitive ability and disparate phenotypes including psychiatric disorders, several autoimmune disorders, longevity, and maternal age at first birth.

In the present study, we first utilized GWAS meta-analysis to combine our prior Cognitive Genomics Consortium (COGENT) consortium GWAS () of psychometrically defined g with the recently reported GWAS (), relying primarily on the brief measure, resulting in a combined cohort of n = 107,207 non-overlapping samples measured for cognitive performance. Next, we utilized MTAG to combine these results with the large-scale GWAS of educational attainment, resulting in further enhanced power. At each step, we performed both allelic and gene-based tests. We then performed downstream analyses on the resulting MTAG summary statistics, including: (1) competitive gene set analyses to identify key biological processes and potential drug targets implicated, (2) stratified linkage disequilibrium score regression (LDSC) to identify differential cell type expression, (3) transcriptome-wide association study (TWAS) methods, to identify specific effects of altered gene expression in the brain on cognition, and (4) LDSC to identify genetic correlations with other anthropometric and biomedical phenotypes.

A further approach to enhancing power in cognitive GWASs has focused on educational attainment as a proxy phenotype (). It is acknowledged that this phenotype is “noisy”, as it is influenced by non-cognitive genetic (e.g., personality;) and environmental (e.g., socio-economic;) factors; consequently, observed allelic effect sizes have been even smaller than those obtained for GWASs of g (). However, by utilizing a single-item measure (years of education completed), obtained incidentally in large studies of other phenotypes, this approach has allowed investigators to obtain extremely large sample sizes. A recent study of educational attainment in nearly 300,000 individuals identified 74 independent GWS loci (). Moreover, a new technique called multi-trait analysis of GWAS (MTAG)() has been developed which permits integration of GWAS data across related traits, accounting for the possibility of overlapping samples across studies and requiring only summary statistics. The developers of MTAG demonstrated its accuracy and utility in a study of traits (depression, neuroticism, and subjective well-being) that demonstrate genetic correlations in the range of ∼.70–.75; importantly, the genetic correlation between cognitive performance and educational attainment has been consistently reported to be in the same range (). MTAG is able to quantify the degree of “boost” to the signal of a single-trait GWAS, providing an estimate of observed sample size and providing summary statistics (allelic weights) that can then be utilized in all downstream annotation pipelines available for GWAS output.

Very recently, a cognitive GWAS () was able to leverage a very brief measure of fluid intelligence, highly correlated with psychometrically defined g, obtained in over 50,000 subjects. In combination with several traditional cognitive GWAS cohorts, total sample size was 78,308. This sample size permitted discovery of 18 independent GWS allelic loci, as well as numerous additional loci from gene-based analysis. This report was critical in demonstrating that signal could be enhanced by combining data from cohorts with brief measures of intelligence with data from more traditional cognitive GWASs.

Genome-wide association studies (GWASs) have been highly successful at uncovering hundreds of genetic loci associated with heritable quantitative traits such as height () and weight/body mass index (BMI)(). However, identifying genetic loci underlying cognitive ability has been much more challenging, despite heritability of 0.5 or greater, as determined by both classical twin studies () and molecular genetic studies (). In part, the difficulty with cognitive GWASs may be caused by the relative heterogeneity in the measurement of the cognitive phenotype. Traditionally, general cognitive ability (g) has been defined as a latent trait underlying shared variance across multiple subdomains of cognitive performance, psychometrically obtained as the first principal component of several distinct neuropsychological test scores (). Using this approach, several cognitive GWASs with fewer than 20,000 subjects yielded no genome-wide significant (GWS) effects (), while a few GWS loci were identified in larger GWAS of 35,298 () and 53,949 () subjects, respectively. By contrast, two independent GWASs of height with sample sizes of approximately 30,000 subjects each yielded 20–30 GWS hits (). Allelic effect sizes were ∼2–5 times larger than the largest obtained in cognitive GWASs ().

Cognition appeared to be strongly associated at the genetic level with aging, education, personality, neuropsychiatric disorders, reproductive behavior, and smoking behavior. Strong association with parental age at death was observed for both the GWAS meta-analysis and MTAG results. Meanwhile, moderate associations with anthropometric traits were observed, although associations with brain volumes were surprisingly modest, except for total intracranial volume (rfor MTAG results = 0.31, p = 7.37E-19). While many of these correlations have been described previously (), two results observed in the present study were not reported in those prior publications. First, we report a strong positive genetic correlation between cognitive performance and maternal age at first birth (rfor MTAG results = 0.63, p = 2.36E−163) and inverse correlation with parental number of children ever born (rfor MTAG results = −0.22; p = 6.91E−13). It is possible that these effects are mediated by years of higher education, insofar as correlations were even stronger with educational attainment (rfor parental age at first birth = 0.72, p = 2.24E−244; rfor number of children = −0.26, p = 3.34E−18). As with any other regression relationship, a role for unmeasured mediators, such as propensity for delayed gratification, cannot be ruled out. Second, we observed modest, yet nominally significant, inverse correlations between cognition and autoimmune diseases such as eczema and Crohn’s disease, attaining Bonferroni significance for rheumatoid arthritis (rfor MTAG results = −0.2086; p = 1.60E−08). There was also a Bonferroni-significant positive genetic correlation with celiac disease (rfor MTAG results = 0.1922; p = 0.0001). While results of cross-trait analyses were largely consistent using either the GWAS results, the MTAG results, or the previously published educational attainment datasets, there were notable divergences in correlations with psychiatric phenotypes, especially schizophrenia and bipolar disorder.

LD score regression was carried out across 89 traits in 15 broad phenotypic categories in LD Hub (): (1) aging, (2) anthropometric, (3) autoimmune, (4) brain volume, (5) cardiometabolic, (6) education, (7) glycemic, (8) lipids, (9) lung function, (10) neurological, (11) personality, (12) psychiatric, (13) reproductive behavior, (14) sleep, and (15) smoking behavior ( Figure 5 Table S14 ). We performed LD score regression separately for the results of our initial meta-analysis and for the MTAG results. For comparison, we also present LD score regression results for the educational attainment GWAS of. It should be noted that only 14 phenotypes were examined for genetic correlation in that publication.

Genetic correlations (r) between cognitive phenotypes and other publicly available GWAS results, based on LD score regression. The first and second columns (labeled METAL and MTAG, respectively) refer to results of the cognitive meta-analyses in the present report. The third column displays correlations for the educational attainment GWAS of

Second, we applied a Bayesian fine-mapping approach (CAVIAR-BF;) to identify putative causal SNPs within each associated locus, as defined in Table S9 . CAVIAR-BF revealed that there was strong evidence (BF = 3.71E+2) for at least 1 causal SNP within each of the 70 independent MTAG loci. There is also evidence that there are at least 2 causal SNPs in 65 of the loci (BF = 3E+6) and at least 3 causal SNPs in 47 of the loci (BF = 2.86E+6). In the extended region analysis, there was evidence for at least 1 causal SNP (BF = 3.45E+2) and 2 causal SNPs (BF = 2.89E+6) for 70 and 63 loci, respectively. Model search revealed that there were 386 putative causal SNPs within the 70 independent loci ( Table S10 ). Lookups of these SNPs in two brain expression quantitative trait loci (eQTL) databases (BrainEAC [] and CommonMind []) revealed several additional SNP-eQTL relationships that can explain variance in the cognitive phenotype ( Tables S11 and S12 ). The most notable eQTL effect was observed for rs3809912 on chromosome 18. This SNP, which was GWS in the MTAG results (p = 7.06E−09), was a strong eQTL for CEP192 (p = 5.1E−38, FDR < 0.01). This eQTL was confirmed in the CommonMind database (FDR < .01), which demonstrated that expression of 44 independent transcripts in the frontal cortex were significantly associated with MTAG SNPs at the FDR < .01 level. Combining annotation information from the Mendelian gene analysis, MetaXcan TWAS, Braineac, and CommonMind databases, we found supporting functional evidence for 112 of the 350 candidate genes nominated by MTAG ( Table S13 ). The remaining 238 genes without functional support had statistical evidence for association to cognition but are considered to be “candidate genes” requiring further functional or experimental support.

In order to derive specific biological insights from the broad association loci implicated by MTAG, we performed a series of analyses designed to identify individual gene expression changes associated with cognition. First, we performed transcriptome wide analysis (TWAS) using MetaXcan () on MTAG SNP results in order to identify transcripts for which upregulation or downregulation in specific neural compartments was associated with cognition. Note that TWAS follows a similar logic to imputation, in that an external reference (in this case, publicly available GTEx eQTL data for 10 brain regions) is utilized to link SNP-based summary statistics to tissue-based expression levels. As shown in Figure 4 B (and detailed in Table S8 ), most of the significant TWAS results are expressed across all neural tissues, involving genes such as AMIGO3, RNF123, and RBM6. Moreover, no individual tissue compartment was much more strongly enriched for associations compared to the others. However, a few strong transcriptomic associations were specific to individual brain regions. For example, the strongest result in hippocampus was with DAG1. TWAS demonstrated that greater expression of this gene in the hippocampus was associated with higher cognitive scores. However, this gene was not expressed in other neural tissue types in the Genotype-Tissue Expression (GTEx) database. Similarly, lower levels of ACTR1A were significantly associated with better cognition, but this transcript was observed only in the frontal cortex.

Stratified LD score regression () also demonstrated an enrichment of cell type expression for neuronal tissues only. Notably, genes found in the neuronal expression list ofwere significantly enriched (p = 0.0129; Bonferroni-corrected p = 0.0386), whereas negative results were obtained for genes expressed in oligodendrocytes (p = 0.4997) and astrocytes (p = 0.9057). Additionally, using Roadmap annotations, epigenetic enrichment was strongest in fetal brain tissue DNase sites and H3K4me1 primed enhancers, followed by adult cortical H3K27ac active enhancer sites (see Table S7 for further details). No enrichment was observed for any non-neuronal tissue. Again, results were not substantively changed when the three large loci were removed from these analyses.

Competitive pathway analysis for drug pathways () revealed that the gene targets of two drugs were significantly enriched in the MTAG results ( Table 2 , bottom): Cinnarizine, a T-type calcium channel blocker, and LY97241, a potassium channel inhibitor. L-type calcium channel blockers and anti-inflammatories also showed suggestive evidence of enrichment. In a related analysis of drug classes, significant enrichment was observed for voltage-gated calcium channel subunits (p = 9.28E−06, Bonferroni-corrected p = 5.38E−04).

Downstream MAGMA expression profiles and competitive pathway analysis were conducted as part of the FUMA pipeline. MAGMA tissue expression profile analysis revealed that genes emerging from the MTAG analysis were significantly enriched for expression in nearly all central nervous system tissues (except for substantia nigra and spinal cord) and that this enrichment was exclusive to neural tissues ( Figure 4 A). Notably, the strongest enrichment was observed for genes expressed in the cerebellum, followed by the cortex, and slightly weaker (but still strongly significant) enrichment in subcortical and limbic structures. Competitive pathway analysis (based on gene ontology categories) for GWS MAGMA genes identified by MTAG revealed significant enrichment of neuronal and synaptic cellular components, as well as the biological processes of neurogenesis and regulation of synapse organization ( Table 2 , top). Because three MTAG loci (at chromosome 3q21.31, 16p11.2, and 17q21.31) were unusually large, each containing ≥15 genes that may have disproportionately impacted enrichment results, we re-ran the above tissue expression and pathway analyses excluding these three regions. Results were substantively unchanged: all of the same neural tissues remained significantly enriched, in the same order of significance as shown in Figure 4 A, and all of the same pathways remained significant (Bonferroni-corrected p < .05) as shown in Table 2 , except for the cellular compartment “dendrite” (Bonferroni-corrected p = 0.089).

(B) Circular Manhattan Plot for MetaXcan results based on MTAG of cognitive performance with educational attainment. From inner circle out, GTEX tissue order is as follows: ACC, Anterior Cingulate Cortex; CDBG, Caudate – Basal Ganglia; CRBHM, Cerebellar Hemisphere; CRBLM, Cerebellum; CRTX, Cortex; FCTX, Frontal Cortex; HIPP, Hippocampus; HYPO, Hypothalamus; NACMB, Nucleus Accumbens; PUTM, Putamen. GWAS threshold is set at Bonferroni-corrected p < 0.05.

(A) Tissue expression profile analysis for genome-wide significant genes (as defined by MAGMA) emerging from the MTAG analysis. Gene results were significantly enriched for expression in nearly all central nervous system tissues (except for substantia nigra and spinal cord) but no tissues outside the central nervous system.

We compared the list of 350 genes emerging from MTAG with a list of 621 genes known to cause autosomal dominant or autosomal recessive Mendelian disorders featuring intellectual disability (). As shown in Table 1 , a total of 23 genes identified by MTAG appeared on this list, representing a 2-fold enrichment over chance (hypergeometric probability p = 0.001). Examining autosomal dominant and recessive Mendelian genes demonstrated a somewhat stronger enrichment for autosomal dominant genes (p = 0.0017) than autosomal recessive genes (p = 0.054).

As a formal validation that the MTAG methodology successfully predicts phenotype variance for cognitive performance, MTAG was re-analyzed, excluding the COGENT cohorts (i.e., the IQ GWAS ofwas combined with the educational GWAS of). The ASPIS (Athens Study of Psychosis Proneness and Incidence of Schizophrenia) and GCAP (NIMH Genes, Cognition and Psychosis Program) datasets were held out as target cohorts used for calculation of polygenic risk score modeling for “g.” Despite the relatively small size of these hold-out cohorts, results show strongly significant polygenic prediction of “g” using MTAG-derived allele weights ( Figure 3 A and 3C ), accounting for more than 4% of the variance in the GCAP cohort. For both cohorts, polygenic prediction began to drop at Pthresholds above 0.05, suggesting that there may be some degree of saturation of signal beyond the nominal 0.05 significance level at these sample sizes. Additional comparisons were made with IQ-only predictions (weights derived from) and education-only predictions (weights derived from) for the same hold-out cohorts ( Figure 3 B and 3D), and we found that the MTAG-derived weights showed a 3.5 times and 3 times improvement in Rvariance explained in the ASPIS cohort, for IQ and education, respectively. For the GCAP cohort, there was a 5.1 times to 96 times improvement in Rvariance relative to IQ or education alone.

MTAG analysis combining the cognitive performance results obtained above with the large educational attainment GWAS previously reported (), resulted in a 75% enrichment of statistical power, effectively boosting the original sample size of n = 107,207 to a GWAS equivalent of n = 187,812. Default clumping procedures revealed that 70 independent genomic loci reached genome-wide significance, with 82 independent SNPs ( Figure 1 B). Similar to the GWAS results above, the PP plot ( Figure S3 ) demonstrated polygenicity without evidence for artifactual inflation of statistical tests (λ = 1.28; λ= 1.001; LD score intercept = 0.91), and overall SNP heritability was 0.336. Of the 70 GWS loci, 34 were not previously reported as GWS in published studies of cognitive or educational phenotypes ( Figure 2 Table S1 ). All but two of the 30 loci identified in the meta-analysis remained genome-wide significant in the MTAG results. Even these two loci showed the same direction of allelic effects between cognitive meta-analytic GWASs and the educational GWASs. The majority of the 13,549 SNPs reaching a nominal significance threshold in the MTAG analysis were intergenic or intronic ( Table S2 Figure S4 ). GWAS catalog annotations are listed in Table S3 . Within the GWS loci, 265 protein-coding genes were identified ( Table S4 ). Additionally, 256 genes were significant in MAGMA gene-based tests ( Table S6 ). Of these genes, 85 were non-overlapping with the 265 genes within SNP GWS loci, resulting in a total of 350 genes receiving GWS support from the MTAG results.

Venn diagram depicting overlap and independence of genome-wide significant SNP loci observed in three studies: the MTAG analysis of the present report, the cognitive performance GWAS reported by, and the educational attainment GWAS of

The significant loci harbored 88 known protein-coding genes ( Table S4 ), about half of which were in three large regions ( Figure S2 ), including two well-characterized regions: the distal 16p11.2 region, in which deletions have been associated with schizophrenia and other neuropsychiatric phenotypes (), and the 17q21 region, in which inversions have been associated with neuropsychiatric disorders (). Using MAGMA (Multi-marker Analysis of GenoMic Annotation;) gene-based tests, 73 genes were genome-wide significant ( Table S5 ), of which 39 were overlapping with the 88 genes noted above, resulting in a total of 122 candidate genes with statistical evidence of association to cognitive performance.

Meta-analysis of all non-overlapping cohorts from the two GWASs of cognitive performance (total n = 107,207) identified 28 independent genomic loci reaching genome-wide significance (GWS, p < 5E−8) using default clumping parameters from the Functional Mapping and Annotation (FUMA) pipeline ( Figure 1 A), representing a 55.6% increase in loci compared to the previous GWAS () of cognitive performance. Two of these loci each contained two uncorrelated variants with independent effects, resulting in 30 independent lead SNPs. Evidence for spurious inflation of statistical tests was quite limited for a large study of a highly polygenic trait (λ = 1.23; λ= 1.001; linkage disequilibrium (LD) score intercept = 1.03; see also PP plot in Figure S1 ), and overall SNP heritability was 0.168. Of the 28 GWS loci, 12 were not previously reported as GWS in published studies of cognitive or educational phenotypes ( Table S1 ). The majority of the 5,610 markers reaching a nominal significance threshold were intronic SNPs followed by those in the intergenic regions ( Table S2 ). As shown in Table S3 , several of the GWS loci overlap with loci related to schizophrenia, bipolar disorder, and other neuropsychiatric phenotypes, as well as obesity/BMI and other traits.

It is important to emphasize that uncovering genetic variation underlying general cognitive ability in the healthy population does not have deterministic implications. As has been previously explicated in similar studies (), effect sizes for each allele are extremely small (R< 0.1% for even the strongest effects), and the combined effects genome-wide predict only a small proportion of the total variance in hold-out samples ( Figure 3 ). Thus, results of the present study do not hold the potential for individual prediction or classification. Nevertheless, the results may still have substantial impact on our understanding of molecular mechanisms underlying cognitive ability.

As noted above, one of the most important aims of GWAS studies is the identification of novel drug targets, and it has been suggested that targets with supporting GWAS evidence may be twice as successful in clinical development compared to those without such evidence (). Our drug set enrichment analysis pointed to several potential nootropic mechanisms. Most notably, the strongest signal was for cinnarizine, a T-type calcium channel inhibitor typically prescribed for seasickness. In the present study, we discovered an association of cognition to CACNA1I, which encodes one component of the voltage-dependent T-Type Cav3.3 channel and has been previously associated with schizophrenia (). While cinnarizine has strong antihistamine activity and may be inappropriate for general cognitive enhancement, a novel agent targeting Cav3.3 has shown nootropic activity in preclinical models (). In addition to gene set results suggesting a potential role for calcium and potassium channel regulation, single-gene results also point toward a potential role for the metabotropic glutamate receptor encoded by GRM3. This gene is also implicated in schizophrenia (), and drugs targeting GRM3 have been suggested as a potential treatment (); however, a large-scale trial of one such agent was unsuccessful in treating psychotic symptoms (). Based on the present results, future studies may seek to examine a role for such compounds in cognitive remediation. It is also noteworthy that the present study identified genome-wide significant evidence implicating three phosphodiesterase genes: PDE1C, PDE2A, and PDE4D. In particular, there is growing interest in PDE2A inhibitors as potential agents for cognitive enhancement (), and evidence suggests that these agents may enhance synaptic plasticity via presynaptic modulation of cAMP hydrolysis (). PDE4D inhibition is also under investigation as a potential therapy for neurodegenerative disease ().

The overlap of 23 genes from our results with known genes for Mendelian disorders characterized by intellectual disability has several implications. First, this statistically significant enrichment provides partial validation of our MTAG results. Second, genes with known mutations of large effect, when combined with our data demonstrating SNPs with smaller regulatory effects on the same phenotype (cognition), can be considered an “allelic series” ()—a natural set of experiments powerfully demonstrating directional information (in the form of a dose-response curve) regarding gene function. Such information can be leveraged for the identification of novel drug targets. Third, converging evidence across the Mendelian and GWAS lists can aid interpretation of specific pathways and molecular processes that are necessary to normal neuronal function and vice versa. For example, two genes on both the Mendelian and GWAS lists (GMPPB and LARGE) are associated with dystroglycanopathies with mental retardation. This information provides context for the observation that DAG1, which encodes dystroglycan 1, is the strongest TWAS result in the hippocampus. DAG1 is necessary for GABAergic signaling in hippocampal interneurons (). While dystroglycanopathies are most prominently characterized by muscular dystrophy and retinal abnormalities, it is possible that all of these genes play a role in hippocampal synapse formation that is relevant to normal cognitive ability.

By utilizing TWAS methodology, we were able to isolate expression effects of specific genes within some of our broad GWAS loci. For example, ACTR1A, which lies near the GWAS peak at chromosome 10q24, encodes a microtubular dynactin protein involved in retrograde axon transport (). Other genes at this locus were not significant in the TWAS analysis (although a role in cognition cannot be ruled out, given the limited sample size in the reference brain expression datasets in GTEx). However, most of the genes implicated by TWAS were clustered in a few “hot” genomic loci, which may represent topologically associated domains (TADs) under the control of a shared three-dimensional chromatin structure (). Whether effects on cognition are driven by all differentially expressed genes within such loci or if specific effects can be disentangled through experimental means remains to be determined.

While synaptic mechanisms were strongly implicated by our results, it is noteworthy that there was no statistical evidence for enrichment of genes expressed in oligodendrocytes or astrocytes. While developmental disorders primarily affecting oligodendrocytes, such as metachromatic leukodystrophy, are marked by cognitive impairment (), it is possible that individual variation in cognitive ability within the normal range is less directly under genetic control via white matter mechanisms. By contrast, strong evidence was provided for the involvement of genes expressed in the cerebellum. Converging evidence from functional imaging studies, lesion studies, structural connectivity, and evolutionary considerations strongly implicate a role for the cerebellum in higher cognitive functions (), possibly through the mechanism of prediction and error-based learning ().

Downstream analysis confirmed an important role for neurodevelopmental processes in cognitive ability, consistent with implications from the education GWAS (). Significant genes were more strongly enriched for expression in fetal brain tissue than adult tissue. Results were also enriched for genes implicated in early neurodevelopmental disorders, and neurogenesis was the most strongly enriched GO biological process. At the same time, it is important to emphasize that adult neural tissues were also strongly represented in the results, and multiple synaptic components were significant in the pathway analysis. In this context, it is noteworthy that many cellular processes necessary for early neurodevelopment are also involved in adult synaptic plasticity. This duality is represented by several significant genes emerging from our analysis. CELSR3 encodes an atypical cadherin plasma membrane protein involved in long-range axon guidance in neurodevelopment through planar cell polarity signaling () but is also necessary for adult formation of hippocampal glutamatergic synapses (). Similarly SEMA3F is a negative regulator of dendritic spine development in adult hippocampus () but embryonically serves as an endogenous chemorepellent, guiding septohippocampal fibers away from non-limbic regions of developing cortex ().

Uncovering the molecular genetic basis of individual differences in cognitive performance can have a significant impact on our understanding of neuropsychiatric disorders, which are both phenotypically () and genetically () correlated with cognition, as well as numerous non-psychiatric health-relevant phenotypes (), which also demonstrate significant genetic correlations with cognitive function. Here, we have presented the largest GWAS of cognition to date, with 107,207 individuals phenotypically characterized for performance on standardized tests measuring general cognitive ability. Results were further enhanced by utilizing a relatively new approach to allow meta-analysis with a large-scale GWAS of educational attainment, which is highly (though not perfectly) correlated with cognitive ability at the genetic level. With this approach, we were able to identify 70 genomic loci significantly associated with cognition, implicating 350 candidate genes underlying cognitive ability. In total, we found that common SNPs were able to account for roughly half of the overall heritability of the phenotype as determined by prior family studies ().

LD score regression allows genetic correlations to be computed across traits (), which allows further insights to be drawn from understanding the degree to which genetic architecture are shared across traits. To further examine potential traits that overlap with the cognitive architecture from the cognition meta-analysis results and MTAG results, LD score regression was conducted via the LD-hub pipeline, a centralized trait database (). LD score regression was carried out across 89 traits in 15 broad phenotypic categories: (1) aging, (2) anthropometric, (3) autoimmune, (4) brain volume, (5) cardiometabolic, (6) education, (7) glycemic, (8) lipids, (9) lung function, (10) neurological, (11) personality, (12) psychiatric, (13) reproductive behavior, (14) sleep, and (15) smoking behavior. Very recent reported GWAS summary statistics for attention deficit hyperactivity disorder (ADHD;) and intracranial volume (ICV;) were included as additional phenotypes. For comparison, we also present LD score regression results for the educational attainment GWAS of. It should be noted that only 14 phenotypes were examined for genetic correlation in that publication. It should be noted that the MHC (Major Histocompatibility Complex) region was redacted from all datasets prior to LD score regression analysis, as per standard protocol at LD-Hub.

To identify potential causal variants in each of the independent loci, CAVIAR-BF is implemented to a region ±50KB of a lead SNP identified in the MTAG analysis. We followed similar procedures setting prior effect distribution σto 0.1 in the model, which was recommended for GWAS studies ( https://bitbucket.org/Wenan/caviarbf ). The prior probability of being causal for each SNP is set to 1/m, where m is the number of SNPs. Bayes factor was calculated for three model sets for independent loci, which modeled for 1, 2, and up to 3 causal SNPs within each independent regions, after which a model search algorithm searches and identifies the putative causal SNPs. These SNPs were then annotated using the Ensembl Variant Effect Predictor (). The analysis was repeated for extended regions taking into account the length of the independent loci identified by earlier FUMA procedures modeling for either 1 or 2 causal SNPs. SNPs identified by the two stage CAVIARBF analysis were then examined for potential gene expression in the BrainEAC () and CommonMind () databases. BrainEAC top SNP lookups were for the following tissue expression across n = 134 individuals: aveALL, all area combined; CRBL, cerebellum; FCTX, frontal cortex; HIPP, hippocampus; MEDU, medulla; OCTX, occipital cortex; PUTM, putamen; SNIG, substantia nigra; TCTX, temporal cortex; THAL, thalamus; and WHMT, white matter. Finally, the prefrontal cortex lookup was included as part of the CommonMind consortium brain expression profile in n = 467 genetically inferred Caucasian samples.

Transcriptome-wide analysis was carried out via MetaXcan (), which allows for GTEx brain expression data to be integrated with GWAS summary statistics. MetaXcan computes downstream phenotypic associations of genetic regulation of molecular traits, using elastic, adjustment for model uncertainty, and colocalization of GWAS and eQTL signals (). GTEx Version 6, brain tissue expression profiles and sample sizes include the anterior cingulate cortex (n = 72); caudate-basal ganglia (n = 100); cerebellar hemisphere (n = 89); cerebellum (n = 103); cortex (n = 96); frontal cortex (n = 92); hippocampus (n = 81); hypothalamus (n = 81); nucleus accumbens (n = 93); and putamen (n = 82).

Functional characterization of GWAS summary statistics was carried out via stratified LD regression to investigate if heritability of cognitive performance is enriched in specific tissue or cell types. Summary statistics were first subjected to baseline partitioned heritability and thereafter passed through a cell-type-specific functional characterization pipeline (). Cell-type characterization includes the DEPICT tissue expression database, GTEX tissue expression, IMMGEN immune cell types, CAHOY brain level cell types, and the ROADMAP cell epigenomic marks.

To validate that the genetic architecture elucidated via the MTAG methodology, we attempted to predict the phenotypic variance of general cognitive function in two of the independent COGENT cohorts (ASPIS and GCAP). MTAG analysis was conducted as above, but removing the COGENT cohorts. Polygenic score prediction across multiple thresholds of Pwas conducted using PRSice (). To compare the effectiveness of MTAG, we also conducted polygenic risk prediction using IQ-only and education-only summary statistics. Finally, Racross SNP thresholds is compared to obtain the degree of improvement in terms of the ratio of MTAG PRS Rvalues versus those of IQ or education PRS R

We compared the list of genes resulting from the MTAG analysis (including all genes within GWS SNP loci, as well as GWS genes identified with MAGMA) with a list of 621 genes known to cause autosomal dominant or autosomal recessive Mendelian disorders featuring intellectual disability. This list is primarily derived from a recent comprehensive review (), supplemented by a subsequent large-scale study of consanguineous multiplex families (). A total of 193 autosomal dominant genes were identified, and a total of 413 autosomal recessive genes were identified. Fifteen genes were annotated as causing both autosomal dominant and autosomal recessive disorders with intellectual disability. Statistical significance was determined by probabilities derived according to the hypergeometric distribution. For this purpose, the total pool of autosomal genes was set to 19,011 (per Gencode).

GWAS summary statistics from the METAL meta-analysis and MTAG analysis were separately entered into the FUMA pipeline (). The FUMA pipeline enables fast prioritization of genomic variants and genes and permits interactive visualization of genomic results with respect to state-of-the-art bioinformatics resources. Manhattan and QQ plots are produced, and MAGMA gene-based analysis is performed, accounting for gene size and LD structure. FUMA was also utilized to perform competitive gene-set analyses for GO cell compartment and biological process categories using the Molecular Signature Database (MsigDB 5.2). A separate competitive gene-set analysis was also conducted for the drug-based pathways previously described by. The pipeline also generates aggregated statistics for independent loci, lead SNPs, tagged genes, and supplementary plots—including SNP and locus annotations. Default clumping parameters are: GWAS p value < 5E−08; rthreshold to define LD structure of independent SNPs > 0.1; maximum p value cutoff < 0.05; population for clumping = EUR; minor allele frequency filter > 0.01; maximum distance between LD blocks to merge into a single locus= 250 kb. Follow-up queries were then made for independent loci of the cognitive performance meta-analysis as well as the MTAG results and compared against summary statistics for the prior cognitive and education GWAS. For purposes of comparison, loci in which the lead SNPs were within 500kb of each other were considered overlapping.

To further enrich genetic signals, we employed a newly developed methodology that integrates LD score regression and meta-analysis techniques across related traits: MTAG (). MTAG (v0.9.0) was applied to the METAL results described immediately above and combined with summary statistics from the recent, large-scale education GWAS (). MTAG analysis allows the boosting of genetic signals across related traits and has been found to be effective in resolving unknown sample overlaps, generating trait-specific effect estimates weighted by bivariate genetic correlation. The MTAG QC pipeline aligned all alleles across both sets of summary statistics and ensured that SNPs were present across all datasets. SNPs that were not present in either dataset were removed. The final SNP count for MTAG was 7,333,576. The MTAG methodology proceeds by: (1) estimating the variance-covariance matrix of the GWAS estimation error, by using a series of LD score regressions, of which, under the known properties of LD score regression, captures relevant sources of estimation error, incorporating population stratification, unknown sample overlap, and cryptic relatedness, (2) estimating the variance-covariance of SNP effects using the maximum likelihood procedure reported in, and (3) computing the MTAG estimator for each SNP and each trait. Summary statistics consisting of SNP, CHR, BP, per SNP sample size, BETA, and SE for each trait were entered to the MTAG python command line. The resulting effect estimates and p values are interpreted the same as single-trait GWAS, which allows standard downstream follow-up analysis on the summary statistics. The python code for MTAG is available at https://github.com/omeed-maghzian/mtag

Fixed-effect meta-analysis was conducted betweenand independent cohorts reported inusing the METAL package (). To ensure that results of the meta-analysis were contributed by both studies, markers present only inor, but not in both, were excluded for further analysis. The number of available markers after QC filtering was 7,357,080. Because the GWAS ofutilized the sample-size-weighted method to perform meta-analysis across its own cohorts and did not report variance terms, our meta-analysis was conducted using the sample-size-weighted method.

Markers reported in the prior COGENT study () were updated to build 37 coordinates but were originally imputed against the HRC (Haplotype Reference Consortium) reference panel () via the Sanger imputation server. To ensure that markers, allele frequencies, and alleles were aligned to the 1000 Genomes phase 3 reference panel (), the COGENT summary statistics () were checked using the EasyQC pipeline (), which allows summary statistics to be aligned and checked against a reference panel of choice. We used the default 1000 Genomes phase 3 reference panel (), provided along with the EasyQC package. Markers were inspected for allele frequency outliers, presence of duplicated markers, and allele mismatches with the 1000 Genomes reference panel. Quality control filters for INFO score < 0.6 and n < 10,000 were additionally implemented. After EasyQC quality control, 8,040,131 SNPs were available for analysis. Only 87 SNPs were excluded due to allele mismatches, 13,276 SNPs were excluded due to allele frequency mismatches from the 1000 Genomes phase 3 reference panel, 283,163 were found to be duplicates and excluded, 104 SNPs were found on the HRC reference panel, but not on the 1000 Genomes phase 3 reference panel, and 2,723,493 SNPs had sample sizes <10,000 individuals. None of the SNPs failed the INFO score < 0.6 cutoff. The same set of SNPs was utilized for subsequent reduced sample meta-analysis without the overlapping LBC1936 and MCTFR cohorts in. As the other prior studies of cognitive performance () and education () were imputed to the 1000 Genomes phase 3 reference panel, summary statistics were used as provided ( https://ctg.cncr.nl/software/summary_statistics https://www.thessgac.org/data ).

The cohorts included in the current study were described in detail in two prior reports on cognitive performance () and one prior report on educational attainment (). Sample sizes for these three studies were n = 78,308, n = 35,298, and n = 328,917, respectively. For the present study, two cohorts reported inwere excluded, so that cohorts included will be independent from those reported in: (1) Minnesota Center for Twin and Family Research (MCTFR) and (2) Lothian Birth Cohort 1936 Study. As a result, sample sizes decreased from the originally reported n = 35,298 to n = 28,899. All phenotypes included were as reported originally in the respective publications. All subjects provided written, informed consent to procedures that were approved by local review boards for the institutions at which each cohort was collected. Further details are available in the supplementary materials to those three publications.

T.L. designed the study and supervised the data analysis. M.L. performed the primary data analysis, and J.W.T., J.Y., and E.K. provided additional statistical input. A.K.M., D.C.G., I.J.D., K.E.B., and G.D. provided the initial conceptual framework for the COGENT consortium. M.L. and T.L. drafted the manuscript. All other authors were involved in ascertainment, assessment, and analysis of individual cohorts, provided conceptual input to study design, and critically reviewed the manuscript.

Acknowledgments

This work has been supported by grants from the NIH (R01 MH079800 and P50 MH080173 to A.K.M., R01 MH080912 to D.C.G., K23 MH077807 to K.E.B., and K01 MH085812 to M.C.K.). Data collection for the TOP cohort was supported by the Research Council of Norway, South-East Norway Health Authority, and KG Jebsen Foundation. The NCNG study was supported by Research Council of Norway Grants 154313/V50 and 177458/V50. The NCNG GWAS was financed by grants from the Bergen Research Foundation, the University of Bergen, the Research Council of Norway (FUGE, Psykisk Helse), Helse Vest RHF, and the Dr. Einar Martens Fund. The Helsinki Birth Cohort Study has been supported by grants from the Academy of Finland, the Finnish Diabetes Research Society, the Folkhälsan Research Foundation, the Novo Nordisk Foundation, Finska Läkaresällskapet, Signe and the Ane Gyllenberg Foundation, University of Helsinki, Ministry of Education, the Ahokas Foundation, and the Emil Aaltonen Foundation. For the LBC1936 cohort, phenotype collection was supported by The Disconnected Mind project. Genotyping was funded by the UK Biotechnology and Biological Sciences Research Council (BBSRC grant No. BB/F019394/1). The work was undertaken by The University of Edinburgh Centre for Cognitive Ageing and Cognitive Epidemiology, part of the cross council Lifelong Health and Wellbeing Initiative, which is funded by the Medical Research Council and the Biotechnology and Biological Sciences Research Council (MR/K026992/1). The CAMH work was supported by the CAMH Foundation and the Canadian Institutes of Health Research. The Duke Cognition Cohort (DCC) acknowledges K. Linney, J.M. McEvoy, P. Hunt, V. Dixon, T. Pennuto, K. Cornett, D. Swilling, L. Phillips, M. Silver, J. Covington, N. Walley, J. Dawson, H. Onabanjo, P. Nicoletti, A. Wagoner, J. Elmore, L. Bevan, J. Hunkin, and R. Wilson for recruitment and testing of subjects. DCC also acknowledges the Ellison Medical Foundation New Scholar award AG-NS-0441-08 for partial funding of this study as well as the National Institute of Mental Health of the NIH under award number K01MH098126. The UCLA Consortium for Neuropsychiatric Phenomics (CNP) study acknowledges the following sources of funding from the NIH: Grants UL1DE019580 and PL1MH083271 (R.M.B.), RL1MH083269 (T.D.C.), RL1DA024853 (E.L.), and PL1NS062410. The ASPIS study was supported by National Institute of Mental Health research grants R01MH085018 and R01MH092515 to D.A. Support for the Duke Neurogenetics Study was provided by the NIH (R01 DA033369 and R01 AG049789 to A.R.H.) and by a National Science Foundation Graduate Research Fellowship to M.A.S. Recruitment, genotyping and analysis of the TCD (Trinity College, Dublin) healthy control samples were supported by Science Foundation Ireland (grants 12/IP/1670, 12/IP/1359 and 08/IN.1/B1916).

Data access for several cohorts used in this study was provided by the National Center for Biotechnology Information (NCBI) database of Genotypes and Phenotypes (dbGaP). dbGaP accession numbers for these cohorts were:

Cardiovascular Health Study (CHS): phs000287.v4.p1, phs000377.v5.p1, and phs000226.v3.p1

Framingham Heart Study (FHS): phs000007.v23.p8 and phs000342.v11.p8

Multi-Site Collaborative Study for Genotype-Phenotype Associations in Alzheimer’s Disease (GENADA): phs000219.v1.p1

Long Life Family Study (LLFS): phs000397.v1.p1

Genetics of Late Onset Alzheimer’s Disease Study (LOAD): phs000168.v1.p1

Minnesota Center for Twin and Family Research (MCTFR): phs000620.v1.p1

Philadelphia Neurodevelopmental Cohort (PNC): phs000607.v1.p1

The acknowledgment statements for these cohorts are found below:

Framingham Heart Study: The Framingham Heart Study is conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with Boston University (Contract No. N01-HC-25195 and HHSN268201500001I). This manuscript was not prepared in collaboration with investigators of the Framingham Heart Study and does not necessarily reflect the opinions or views of the Framingham Heart Study, Boston University, or NHLBI. Funding for SHARe Affymetrix genotyping was provided by NHLBI Contract N02-HL-64278. SHARe Illumina genotyping was provided under an agreement between Illumina and Boston University.

Cardiovascular Health Study: This research was supported by contracts HHSN268201200036C, HHSN268200800007C, N01-HC-85079, N01-HC-85080, N01-HC-85081, N01-HC-85082, N01-HC-85083, N01-HC-85084, N01-HC-85085, N01-HC-85086, N01-HC-35129, N01 HC-15103, N01 HC-55222, N01-HC-75150, N01-HC-45133, and N01-HC-85239 and grant numbers U01 HL080295 and U01 HL130014 from the National Heart, Lung, and Blood Institute and R01 AG-023629 from the National Institute on Aging, with additional contribution from the National Institute of Neurological Disorders and Stroke. A full list of principal CHS investigators and institutions can be found at https://chs-nhlbi.org/pi . This manuscript was not prepared in collaboration with CHS investigators and does not necessarily reflect the opinions or views of CHS or the NHLBI. Support for the genotyping through the CARE Study was provided by NHLBI Contract N01-HC-65226. Support for the Cardiovascular Health Study Whole Genome Study was provided by NHLBI grant HL087652. Additional support for infrastructure was provided by HL105756, and additional genotyping among the African-American cohort was supported in part by HL085251. DNA handling and genotyping at Cedars-Sinai Medical Center was supported in part by the National Center for Research Resources grant UL1RR033176, now at the National Center for Advancing Translational Technologies CTSI grant UL1TR000124, in addition to the National Institute of Diabetes and Digestive and Kidney Diseases grant DK063491 to the Southern California Diabetes Endocrinology Research Center.

Li et al. (2008) Li H.

Wetten S.

Li L.

St. Jean P.L.

Upmanyu R.

Surh L.

Hosford D.

Barnes M.R.

Briley J.D.

Borrie M.

et al. Candidate single-nucleotide polymorphisms from a genomewide association study of Alzheimer disease. Filippini et al. (2009) Filippini N.

Rao A.

Wetten S.

Gibson R.A.

Borrie M.

Guzman D.

Kertesz A.

Loy-English I.

Williams J.

Nichols T.

et al. Anatomically-distinct genetic associations of APOE epsilon4 allele load with regional cortical atrophy in Alzheimer’s disease. Multi-Site Collaborative Study for Genotype-Phenotype Associations in Alzheimer’s Disease: The genotypic and associated phenotypic data used in the study were provided by GlaxoSmithKline, R&D Limited. Details on data acquisition have been published previously inand

Genetics of Late Onset Alzheimer’s Disease Study: Funding support for the “Genetic Consortium for Late Onset Alzheimer’s Disease” was provided through the Division of Neuroscience, NIA. The Genetic Consortium for Late Onset Alzheimer’s Disease includes a genome-wide association study funded as part of the Division of Neuroscience, NIA. Assistance with phenotype harmonization and genotype cleaning, as well as with general study coordination, was provided by Genetic Consortium for Late Onset Alzheimer’s Disease. A list of contributing investigators is available at https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000168.v1.p1

Long Life Family Study: Funding support for the Long Life Family Study was provided by the Division of Geriatrics and Clinical Gerontology, National Institute on Aging. The Long Life Family Study includes GWAS analyses for factors that contribute to long and healthy life. Assistance with phenotype harmonization and genotype cleaning, as well as with general study coordination, was provided by the Division of Geriatrics and Clinical Gerontology, National Institute on Aging. Support for the collection of datasets and samples were provided by Multicenter Cooperative Agreement support by the Division of Geriatrics and Clinical Gerontology, National Institute on Aging (UO1AG023746, UO1023755, UO1023749, UO1023744, and UO1023712). Funding support for the genotyping that was performed at the Johns Hopkins University Center for Inherited Disease Research was provided by the National Institute on Aging, NIH.

Minnesota Center for Twin and Family Research: This project was led by William G. Iacono, PhD. And Matthew K. McGue, PhD (co-principal investigators) at the University of Minnesota, Minneapolis, MN, USA. Co-investigators from the same institution included: Irene J. Elkins, Margaret A. Keyes, Lisa N. Legrand, Stephen M. Malone, William S. Oetting, Michael B. Miller, and Saonli Basu. Funding support for this project was provided through NIDA (U01 DA 024417). Other support for sample ascertainment and data collection came from several grants: R37 DA 05147, R01 AA 09367, R01 AA 11886, R01 DA 13240, and R01 MH 66140.

Philadelphia Neurodevelopmental Cohort: Support for the collection of the data sets was provided by grant RC2MH089983 awarded to Raquel Gur, MD, and RC2MH089924 awarded to Hakon Hakonarson, MD, PhD. All subjects were recruited through the Center for Applied Genomics at The Children’s Hospital in Philadelphia.