In this study, we hypothesized that the ARC complex may contribute to individual differences in intellectual function. We curated the complex and tested whether its SNPs are associated with IQ scores in children of the Avon Longitudinal Study of Parents and Children (ALSPAC). Since polygenic risk for AD is highly correlated with cognitive function [ 23 ], we next determined whether SNPs revealing significant association with intelligence in children were also associated with AD in adults in the International Genomics of Alzheimer’s Project (IGAP). Lastly, we examined whether the most strongly associated SNPs may have functional relevance in gene expression.

At the neurobiological level, intellectual and cognitive functions rely on effective neuroplasticity, the brain’s ability to alter neural networks and synapses. Activity-regulated cytoskeleton associated protein (Arc) and its regulators and interactors have been extensively studied due to their essential role in multiple forms of synaptic plasticity [ 15–17 ]. Arc appears to play hub-like role in its regulation [ 18 ], at least partially by modifying actin cytoskeleton dynamics, receptor endocytosis, and glutamate receptor transcription [ 15–17 ]. Thus, ARC (here we refer to the gene as “ ARC ” and its RNA/protein as “Arc”) and its partners provide a promising set of candidate genes for uncovering genetic components of intellectual function. Here, we collectively refer to these functionally linked genes as the “ ARC complex” ( Table 1 ).

The lack of robust association is common in the search for genes behind polygenic traits, such as intelligence [ 3, 11, 12 ]. Little progress has been made in finding loci that explain even a small fraction of the phenotypic variance in complex traits [ 13 ], posing one of the biggest challenges in present-day behavioral genetics. One promising approach to address this limitation is the examination of functionally related genes as a complex, thereby combining their effects [ 14 ].

The heritability of intelligence has been estimated to increase with age, from ∼30% in childhood up to 80% in adulthood [ 9 ]. However, despite such high heritability, intelligence-related phenotypes have not been robustly associated with any single gene or variant [ 3, 10 ]. Only recently did a large-scale genome-wide association study (GWAS) on these traits identify 13 variants associated with general cognitive function [ 11 ]. This confirms that genetic architecture of intelligence involves many genes of small effect, suggesting that the impact of a single nucleotide polymorphism (SNP) may not be detected without large sample sizes.

Intelligence captures a broad scope of cognitive abilities and can be enumerated by measures of verbal IQ (VIQ) and performance IQ (PIQ), often differentiated into crystallized and fluid types. Crystallized and fluid IQ represent different aspects of intelligence, both behaviorally and biologically [ 5 ]. Crystallized intelligence includes knowledge accumulated throughout life, determined by education and experience, while fluid intelligence consists of problem solving and reasoning abilities that has little reliance on stored knowledge [ 6 ]. Fluid intelligence is strongly correlated with working memory and functional activity observed by magnetic resonance imaging (MRI) during cognitive tasks [ 7, 8 ]. Although these two IQ measures can be combined into a full scale IQ (FSIQ), research on the neural basis of intelligence suggests that it is best represented by at least two dimensions rather than one [ 5 ].

Cognitive function varies between individuals, and is a key predictor of important life outcomes such as mental and physical health as well as longevity [ 1 ]. General intellectual function, commonly measured by an aggregated score (intelligence quotient, IQ) across a wide range of cognitive tasks, can be considered a main trait behind this variation [ 2 ]. It is well established that intelligence is highly heritable and polygenic [ 3 ]. Nonetheless, the extent and nature of this genetic influence is still unknown [ 4 ] and warrants further investigation.

HEK293FT cells (Thermo Fisher Scientific) were cultured in DMEM (Sigma-Aldrich) supplemented with 10% FBS (Sigma-Aldrich), sodium pyruvate (1 mM), glutamine (2 mM), penicillin (100 U/ml) and streptomycin (100 μg/ml). Cells were plated into 96-well plates (1×10 4 cells/well) and co-transfected after 24 h using Lipofectamine 2000 (Invitrogen) with 80 ng pGL3 vector constructs and 20 ng pRL-TK control plasmid (Promega) to allow normalization of transfection efficiency. Transfections were carried out in at least triplicates and repeated six times in independent experiments. Cells were assayed for firefly and Renilla luciferase activities 48 h after transfection using the Dual-Luciferase ® Reporter Assay System (Promega). Measurements were performed on a VICTOR3 Multilabel Plate Reader (Perkin Elmer). The ratio of firefly to Renilla luciferase activity was calculated.

The 300 base pair region of APP intron 1, bearing rs2830077 (NC_000021.9 : 26130583-26130284, GRCh38.p2 reference assembly) was amplified using Phusion ® High-Fidelity DNA Polymerase (NEB) and specific primers tagged with XhoI restriction sites (forward: 5 ′ -ATCCTCGAGTAGTTTCTTAAAACATGG-3 ′ and reverse: 5 ′ -ATCCTCGAGTTATTTAGCTACAAGTTTTAAGA-3 ′ ). The amplicon was then cloned into the pGEM-T Easy vector (Promega), and the A allele of rs2830077 was created by site-directed mutagenesis using QuikChange Lightning Multi Site-Directed Mutagenesis Kit(Agilent). The fragment was then excised from the pGEM-T Easy vector using XhoI restriction enzyme (NEB) and ligated into the same site within the pGL3-Basic vector (Promega) to ensure that the vector backbone used for all the promoter constructs remained the same. The sequences and orientation of inserts of all plasmids were verified by direct sequencing.

Synthetic 3 ′ biotin-labeled double-stranded oligonucleotides corresponding to the rs2830077[A] and rs2830077[C] sequences (20 fmol) and recombinant pure TFCP2 (100 ng, Active Motif) were incubated for 20 min using the Gelshift TM Chemiluminescent EMSA kit (Active Motif) in a 1×binding buffer supplemented with 2.5% glycerol, 10 ng/μl Poly d(I-C), 0.1% NP-40, 50 mM KCl, 0.5 mM MgCl 2 , and 0.12 mM EDTA. Reaction mixtures were separated by 6% PAAG, and products were detected by streptavidin-HRP conjugate. For competition assays, unlabeled oligonucleotides at 100-fold molar excess were added to the reaction mixture 5 min before adding the biotin-labeled probe. Sequences of the double-stranded probes for rs2830077[A] and rs2830077[C] were 5′-GACACGCTGACTTCCAGGCAaAAGCCAGGCACAAGAGAAGC-3′ and 5′-GACACGCTGACTTCCAGGCAcAAGCCAGGCACAAGAGAAGC-3′, respectively.

SNPs surviving correction for multiple testing in association analyses were examined in silico. We used RegulomeDB [ 32 ] and HaploReg (version 4.1; CEU population code; http://www.broadinstitute.org/mammals/haploreg/ ) [ 33 ] to explore whether any of the SNPs may affect gene expression. Given the central role of hippocampus in ARC complex processes and memory formation, we focused on expression in hippocampal tissue. To investigate whether any of the SNPs are located within transcription factor binding sites, we utilized MatInspector (Matrix Library 9.3; http://www.genomatix.de/solutions/genomatix-genome-analyzer.html ) and MATCH (version 1.0; http://www.gene-regulation.com/pub/programs.html ) engines. Variants with predicted functional significance were then assessed further, using electrophoretic mobility shift assays (EMSA) and luciferase reporter assays.

Genes revealing significant association in the gene-set tests were further examined on a single SNP level. Linear regressions were performed to test for association between SNPs and ALSPAC IQ scores in PLINK, version 1.9 [ 31 ]. Sex was set as a covariate, and only SNPs on autosomal chromosomes were examined. Correction for multiple testing was achieved by one million permutations, producing a family-wise empirical p -value. SNPs reaching family-wise p -value below 0.05 were considered significant.

Here, we tested the entire ARC complex, followed by “ ARC expression” and “Arc function” subgroups individually ( Table 1 ) in ALSPAC. The “ ARC expression” subgroup included proteins implicated in the regulation of ARC transcription, mRNA processing, transport, Arc protein translation and degradation, while the Arc function subgroup included proteins that bind Arc or are closely associated with Arc function. ARC itself was not considered as a member of “ ARC expression” or “Arc function” subgroups as by examining those subgroups we aimed at elucidating the role of ARC regulators and interactors only. SNPs within each gene in the complex (including those within 5,000 base pairs upstream and 1,500 base pairs downstream of each gene) were analyzed in MAGMA, version 1.05 [ 30 ], where first the linkage disequilibrium (LD) adjusted gene-based p -values were calculated and then converted to Z-values to regress on gene-set membership, as a predictor. Gene size and gene-sets’ gene density were added as covariates to account for possible confounding effects. Only genes on autosomal chromosomes were examined. Bonferroni correction was applied to correct for multiple testing, with the significance threshold set to 4.17E-03 (12 tests).

To examine the contribution of ARC complex to cognitive function, we carried out the following analyses: (1) tests for association between ARC complex genes and IQ scores in ALSPAC (gene-set tests); (2) genes found to be significant in the gene-set test were further examined for single SNP-based association in ALSPAC; and (3) given reported shared genetics between cognitive function and AD, significant SNPs in ALSPAC were examined in the IGAP to check for association with AD. These analyses are described in more detail below.

ARC complex SNPs were extracted from the summary statistics of the IGAP GWAS ( http://www.pasteur-lille.fr/en/recherche/u744/igap/igap_download.php ) on individuals of European ancestry. IGAP used genotyped and imputed data on 7,055,881 SNPs to meta-analyze four previously-published GWAS datasets [namely, the European Alzheimer’s Disease Initiative (EADI), the Alzheimer Disease Genetics Consortium (ADGC), the Cohorts for Heart and Aging Research in Genomic Epidemiology consortium (CHARGE), and the Genetic and Environmental Risk in AD consortium (GERAD)], consisting of 17,008 AD cases and 37,154 controls [ 29 ].

Additional quality control was performed in the subset of individuals with available IQ measures to ensure that no SNPs or individuals had poor genotyping rates (<95%) and that no SNPs were rare (minor allele frequency, MAF <1%) or out of Hardy-Weinberg Equilibrium ( p < 1.00E-05).

Assessment of intellectual function was performed at 8.5 years of age using the third edition of the Weschler Intelligence Scale for Children (WISC-III) [ 27 ]. Intellectual function was measured by the full-scale IQ score (FSIQ), comprised of both a verbal score (VIQ) generated from the vocabulary subtest and a performance score (PIQ) generated from the matrix reasoning subtest. For more details, refer to the ALSPAC data dictionary ( http://www.bris.ac.uk/alspac/researchers/data-access/data-dictionary/ ). Normal distribution of these variables was assured and all outliers were removed (mean±3 standard deviations (SD)). ALSPAC participants’ DNA was extracted from whole blood or buccal swab samples and prepared for genotyping using standard protocols. A total of 9,912 participants were genotyped using the Illumina HumanHap550 platform. SNPs with a minor allele frequency of <1% and call rate of <95% were removed. Furthermore, only SNPs passing the exact test of Hardy–Weinberg equilibrium ( p > 5.00×10 –7 ) were considered for analyses. Detailed quality control measures can be found at http://www.bristol.ac.uk/media-library/sites/alspac/migrated/documents/gwas-data-generation.pdf . Known autosomal variants were imputed with MACH 1.0.16 Markov Chain Haplotyping software [ 28 ], using CEPH individuals from phase 2 of the HapMap project (HG18) as a reference (release 22). Only SNPs with imputation quality estimates above 0.3 were examined.

The ALSPAC sample was derived from a well-characterized population-based study carried out in southwest England [ 26 ]. Pregnant women residing in the Bristol area of the United Kingdom, who had an estimated date of delivery between April 1,1991 and December 31, 1992, were recruited to take part. Of the original 14,541 pregnancies, 13,988 children were alive at one year. An additional 713 children were enrolled after age seven, resulting in a total sample of 14,701 children. These mother-child pairs have been followed for over 20 years generating an immense amount of data through biological samples, measurements, and questionnaires. The study website contains details of all the data that is available through a fully searchable data dictionary: http://www.bris.ac.uk/alspac/researchers/data-access/data-dictionary/ . Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees. Phenotype-matched genotype data were available for up to 6,832 children, depending on the variables.

The ARC gene set was constructed based on Arc interaction partners and regulators, as evidenced in the literature ( Table 1 ). We followed the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) Statement [ 24 ]. PubMed entries linked to human, mouse, and rat ARC genes (NCBI Gene IDs: 23237, 11838, and 54323, respectively) were obtained on December 14, 2014 using an NCBI cross-reference query. The search yielded 185 papers, all of which were screened to identify those papers showing experimental evidence for Arc interaction partners and regulators ( n = 13). The remaining papers’ references were then examined further, which led to the identification of six additional relevant papers. From these 19 sources, a total of 37 experimentally-verified interaction partners and regulators were identified ( Supplementary Figure 1 ). The gene complex was evaluated in STRING database, version 10 [ 25 ], to confirm its network connectivity and assess its enrichment for the biological and cellular processes based on gene ontology (GO). False discovery rate (FDR) was applied to correct for multiple testing.

To validate the ability of rs2830077 to modulate transcription, we performed a luciferase assay with the first APP intron harboring this SNP. The luciferase activities for both orientations of the A allele were lower than corresponding activities of the C allele (28% and 23% lower luciferase activity for antisense and sense orientation, respectively; Fig. 2 ), confirming that rs2830077 may alter enhancer-like activity of this region.

To validate in silico predictions, we performed EMSA using pure recombinant TFCP2 and synthetic double-stranded oligonucleotides of the two alleles of rs2830077. EMSA demonstrated a shifted band of a DNA-protein complex, with a stronger intensity corresponding to its C allele (46% greater intensity of retarded band; Supplementary Figure 6 ), confirming its higher binding affinity of TFCP2.

H3K27Ac modification and DNAseI hypersensitivity data at rs2830077 also suggested that this region may be an enhancer ( Supplementary Figure 5 ). The potential functional importance of this SNP was supported in HaploReg, which implicated rs2830077 in APP expression [ 34 ]. Further, RegulomeDB, MatInspector, and MATCH all indicated the presence of a putative transcription factor CP2 (TFCP2; also known as LBP-1c/CP2/LSF) binding site at rs2830077. We also found that the matrix score of rs2830077[A] is lower than that of the reference allele C, suggesting that the TFCP2 may have higher binding affinity to the C allele than to the A.

Among the top APP SNPs significantly associated with PIQ, several were predicted to be located within active chromatin compartments, with transcription enhancer properties in hippocampus as well as expression quantitative trait loci (eQTL) activity in whole blood ( Table 3 ).

The 10 APP SNPs surviving the correction for multiple testing in association analyses with the PIQ score were then examined for association with AD in IGAP. All 10 SNPs also revealed association signals with AD. Seven of these survived the Bonferroni correction ( p < 0.005; Table 3 and Supplementary Figure 4 ).

Since tentative association was observed for “ ARC function” subgroup only, we tested individual SNPs in those genes for association with IQ measures in ALSPAC. The strongest signals were observed between 13 SNPs in the APP gene and PIQ, with 10 variants surviving the correction for multiple testing ( Table 3 , Supplementary Figure 4 ). Another strong signal was noted between rs1491579 in the SH3-domain GRB2-like 3 ( SH3GL3 ) gene ( Table 3 ) and FIQ, though it did not survive multiple test correction. Supplementary Table 3 shows results of association for all the variants reaching point-wise p -values below 1.00E-03.

Tentative association was observed between the functional subgroup ( Table 1 ) of ARC complex and VIQ as well as FSIQ in the ALSPAC ( p = 0.027 and 0.041, respectively). No association signal survived correction for multiple testing. Supplementary Table 2 shows the association results of all gene sets, and individual genes comprising those, for all IQ measures.

The features of ALSPAC and IGAP samples are summarized in Table 2 . The characteristics of IGAP sample were derived from its publication [ 29 ], combining 18 datasets. The distribution of the three IQ scores in ALSPAC is presented in Supplementary Figure 3 .

Similar to ARC complex, STRING examination of its subgroups (“ ARC expression” and “Arc function,” Table 1 ) revealed that both of them show indication of biological protein-protein interactions, with the “Arc function” subgroup displaying stronger evidence ( p = 1.52E-09) than the “ ARC expression” subgroup ( p = 3.19E-04). Correspondingly, the latter also showed less implication in GO pathways ( Supplementary Figure 2 ) compared to the “Arc function” subgroup, where the strongest enrichment was observed for neuron projection (GO: 0043005, FDR = 1.47E-07), followed by protein binding (GO:0005515, FDR = 1.52E-07). Supplementary Figure 2 depicts the results of ARC complex subgroups STRING examination.

After assembling our ARC complex ( Table 1 ), we evaluated its functional network in STRING database, which uses experimental and predicted protein-protein interaction information to assess connectivity ( Fig. 1 ). Protein-protein interactions were significantly enriched ( p = 4.8E-14) in the full ARC complex, indicating that the proteins in the curated gene set are biologically connected as a group. The most significant enrichment of the ARC complex as a whole was noted for cellular components (neuron projection, GO:0043005, FDR = 2.59E-07 and neuron part, GO:0097458, FDR = 2.59E-07). Supplementary Table 1 reflects the main findings from these assessments.

DISCUSSION

In this study, we utilized a pathway approach to study genetics of cognition in children (ALSPAC) and followed up the observed association signals in a sample of adults with and without AD (IGAP). Further, we performed a functional examination of associated variants in cell culture. Our experimental approach was prompted by evidence that: (1) cognitive ability may be influenced by genetic variation in synaptic plasticity-related genes [35], (2) AD shares polygenic etiology with cognitive functioning [23], and (3) pathway analyses of functionally related genes can be advantageous as they combine the effect of multiple genes that may be biologically meaningful [14]. Our analyses led us to uncover a common APP variant association with PIQ in children, which was replicated in AD in adults. We further found that this SNP could influence APP expression by affecting TFCP2 binding affinity.

APP encodes the amyloid-β protein precursor (AβPP) that forms Aβ-containing neuritic plaques, the accumulation of which is one of the key histopathological hallmarks in AD. While the role of APP in AD has long been known, its involvement in childhood intelligence has not been previously reported. The association signal observed in this study is between the APP and PIQ (Table 3), believed to reflect fluid intelligence. Examining the data from a recent large-scale GWAS on cognitive functions (N = 112,151) [36], we noted that APP gene revealed signs of association with reaction time (efficiency of information processing) and memory (p = 0.048 and 0.095, respectively; Supplementary Figure 7), further suggesting possible involvement of APP in cognition.

General fluid cognitive functioning in childhood has been proposed to be linked to late-onset dementia [37, 38], with genetic influences possibly being the driving force behind stability of cognitive functioning [39]. In a recent study, carriers of the APOE ɛ4 variant (the best replicated known genetic factor for AD) revealed correlation with working memory and attention in children [40]. Thus, we hypothesize that the genetics of PIQ may be an important determinant of cognitive abilities throughout the lifespan and of age-related dementia.

Genetic overlap between IQ and dementia has previously been reported in other studies, including a large genome-wide analysis in over 100,000 individuals [23]. Another GWAS on general fluid cognitive ability in adults (N = 53,949) identified four genes (namely TOMM40, APOE, ABCG1, and MEF2C) previously associated with AD or neuropathological features of AD and related dementias [11]. Such overlap has also been noted for the APP gene, where a coding variant was shown to be protective against both normal age-related cognitive decline and AD [41]. While the APOE has shown robust association with AD and normal cognitive aging, more in-depth functional studies are needed to more thoroughly understand the functional significance of all genetic contributions of cognitive ability.

In this study, the most prominent significant signal was noted in APP, between the intronic rs2830077 and PIQ in children, also showing association with AD in adults (Table 3). In silico analyses of this SNP prompted its functional evaluation after noting that rs2830077: (1) is located within a region of active chromatin; (2) may have transcriptional enhancer activity in hippocampal tissue; (3) possesses eQTL activity; and (4) is located at a putative TFCP2 binding site. We confirmed via EMSA that TFCP2 binds to this putative enhancer region, with rs2830077 allele-specific binding affinity (Supplementary Figure 5). Further, luciferase assays indicate that the C allele of this SNP confers enhanced expression of APP (Fig. 2). The TFCP2 transcription factor has been implicated in erythroid gene expression [42], repression of HIV transcription [43], and in different cancer types [44, 45]. No direct connections between TFCP2 and AβPP expression have been reported, though a number of studies show association between the TFCP2 gene and AD [46, 47]. The activity of TFCP2 itself is regulated by AβPP-interacting protein Fe65 [48] and the intracellular domain of APP-like protein 2 [49]. Another APP SNP (rs467021), in linkage disequilibrium with rs2830077 (r2 = 0.99952), has been reported to be associated with cognitive decline in AD [50].

The dosage of AβPP has been implicated in altered neuronal endocytosis associated with increased Aβ production and age-related brain atrophy and degeneration, observed in patients with AD and Down syndrome (DS) [51]. Furthermore, duplication of the APP locus is thought to lead to early-onset AD [52] and the trisomy of this locus is likely to contribute to dementia in DS [53]. Indeed, the triplication of the Hsa21 segment including APP in people without DS has been associated with AD [53]. As both AD and DS display dementia, the observed association between PIQ and a SNP potentially altering the expression of APP, and thus its dosage, may help to elucidate the link between variation in intelligence and development of dementia.

We should note that to better estimate the contribution of genetics to IQ through life, examination of longitudinal samples where IQ is measured at early and late ages is desirable. However, since such samples were not available, we took the alternative approach of looking up significant child IQ association signals (ALSPAC) in late adulthood (IGAP). While our findings are intriguing, a direct replication in an independent childhood sample is warranted.

In summary, this study implicates APP in general cognitive abilities. We also show that evidence-based pathway analyses can be useful in identifying genetic factors underlying cognitive function. Follow-up studies are needed to more precisely determine how variants in APP may exert their effects on cognitive function over a lifespan. Such studies may have valuable implications for our understanding of etiology and, eventually, treatment of disorders associated with cognitive dysfunction, such as AD.