Abstract 'Epigenetic age acceleration' is a valuable biomarker of ageing, predictive of morbidity and mortality, but for which the underlying biological mechanisms are not well established. Two commonly used measures, derived from DNA methylation, are Horvath-based (Horvath-EAA) and Hannum-based (Hannum-EAA) epigenetic age acceleration. We conducted genome-wide association studies of Horvath-EAA and Hannum-EAA in 13,493 unrelated individuals of European ancestry, to elucidate genetic determinants of differential epigenetic ageing. We identified ten independent SNPs associated with Horvath-EAA, five of which are novel. We also report 21 Horvath-EAA-associated genes including several involved in metabolism (NHLRC, TPMT) and immune system pathways (TRIM59, EDARADD). GWAS of Hannum-EAA identified one associated variant (rs1005277), and implicated 12 genes including several involved in innate immune system pathways (UBE2D3, MANBA, TRIM46), with metabolic functions (UBE2D3, MANBA), or linked to lifespan regulation (CISD2). Both measures had nominal inverse genetic correlations with father’s age at death, a rough proxy for lifespan. Nominally significant genetic correlations between Hannum-EAA and lifestyle factors including smoking behaviours and education support the hypothesis that Hannum-based epigenetic ageing is sensitive to variations in environment, whereas Horvath-EAA is a more stable cellular ageing process. We identified novel SNPs and genes associated with epigenetic age acceleration, and highlighted differences in the genetic architecture of Horvath-based and Hannum-based epigenetic ageing measures. Understanding the biological mechanisms underlying individual differences in the rate of epigenetic ageing could help explain different trajectories of age-related decline.

Author summary DNA methylation, an epigenetic process, is known to vary with age. Methylation levels at specific sites across the genome can be combined to form estimates of age known as ‘epigenetic age’. The difference between epigenetic age and chronological age is referred to as ‘epigenetic age acceleration’, with positive values indicating that a person is biologically older than their years. Understanding why some people seem to age faster than others could shed light on the biological processes behind age-related decline; however, the mechanisms underlying differential rates of epigenetic ageing are largely unknown. Here, we investigate genetic determinants of two commonly used epigenetic age acceleration measures, based on the Horvath and Hannum epigenetic clocks. We report novel genetic variants and genes associated with epigenetic age acceleration, and highlight differences in the genetic factors influencing these two measures. We identify ten genetic variants and 21 genes associated with Horvath-based epigenetic age acceleration, and one variant and 12 genes associated with the Hannum-based measure. There were no genome-wide significant variants or genes in common between the Horvath-based and Hannum-based measures, supporting the hypothesis that they represent different aspects of ageing. Our results suggest a partial genetic basis underlying some previously reported phenotypic associations.

Citation: Gibson J, Russ TC, Clarke T-K, Howard DM, Hillary RF, Evans KL, et al. (2019) A meta-analysis of genome-wide association studies of epigenetic age acceleration. PLoS Genet 15(11): e1008104. https://doi.org/10.1371/journal.pgen.1008104 Editor: John M. Greally, Albert Einstein College of Medicine, UNITED STATES Received: March 19, 2019; Accepted: August 16, 2019; Published: November 18, 2019 Copyright: © 2019 Gibson et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: Summary statistics from the research reported in the manuscript will be made available immediately following publication on the Edinburgh Data Share portal with a permanent digital object identifier (DOI). According to the terms of consent for Generation Scotland participants, requests for access to the individual-level data must be reviewed by the GS Access Committee (resources@generationscotland.org). Individual-level data are not immediately available, due to confidentiality considerations and our legal obligation to protect personal information. These data will, however, be made available upon request and after review by the GS access committee, once ethical and data governance concerns regarding personal data have been addressed by the receiving institution through a Data Transfer Agreement. Funding: Generation Scotland received core support from the Chief Scientist Office of the Scottish Government Health Directorates (CZD/16/6) and the Scottish Funding Council (HR03006). Genotyping and DNA methylation profiling of the GS samples was carried out by the Genetics Core Laboratory at the Wellcome Trust Clinical Research Facility, Edinburgh, Scotland and was funded by the Medical Research Council UK and the Wellcome Trust (Wellcome Trust Strategic Award “STratifying Resilience and Depression Longitudinally” ((STRADL) Reference 104036/Z/14/Z)). Funding details for the cohorts included in the study by Lu et al. (2018) can be found in their publication. HCW is supported by a JMAS SIM fellowship from the Royal College of Physicians of Edinburgh and by an ESAT College Fellowship from the University of Edinburgh. AMM & HCW acknowledge the support of the Dr. Mortimer and Theresa Sackler Foundation. SH acknowledges support from grant 1U01AG060908-01. REM is supported by Alzheimer’s Research UK major project grant ARUK-PG2017B-10. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist.

Introduction Ageing is associated with a decline in physical and cognitive health, and is the main risk factor for many debilitating and life-threatening conditions including cardiovascular disease, cancer, and neurodegeneration [1]. Ageing is a multi-dimensional construct, incorporating physical, psychosocial, and biological changes. Everyone experiences the same rate of chronological ageing, but the rate of ‘biological ageing’, age-related decline in physiological functions and tissues, differs between individuals. Various phenotypic and molecular biomarkers have been used to study biological ageing, including a number of 'biological clocks', the best known of which is telomere length. Telomeres shorten with increasing age, and telomere length has been found to predict morbidity and mortality [2]. More recently, research into epigenetics–chemical modifications to DNA without altering the genetic sequence–has yielded another method for measuring biological age. DNA methylation is an epigenetic modification, typically characterised by the addition of a methyl group to a cytosine-guanine dinucleotide (CpG) [3], that can influence gene expression and is associated with variation in complex phenotypes. This process is essential for normal development and is associated with a number of key processes including ageing. DNA methylation levels are dynamic, varying with age across the life course [4,5] and are influenced by both genetic and environmental factors [6]. Weighted averages of methylation at multiple CpG sites can be integrated into estimates of chronological age referred to as ‘epigenetic age’. Two influential studies have used this method to create ‘epigenetic clocks’, which accurately predict chronological age in humans. Hannum et al. used DNA methylation profiles from whole blood from two cohorts to identify 71 CpG sites that could be used to generate an estimate of age [7], while Horvath used data from 51 different tissue types from multiple studies to identify 353 CpG sites whose methylation levels can be combined to form an age predictor [8]. Hannum et al.’s clock is specific to blood samples, although it can be adjusted for different tissue types using linear models. The Horvath clock is widely applicable, with the same CpG set and the same algorithm being used irrespective of the DNA source. Although similar penalised regression models were used to select the CpG sites to be included in each of these epigenetic clocks, there is limited overlap in the CpGs included. The two measures are clearly related, but are thought to capture slightly different aspects of the biology of ageing [9]. The Hannum age estimator correlates with proportions of certain blood cells, reflecting its construction based on blood methylation data [9,10], and it is considered to track aspects of immunosenescence. The pan-tissue Horvath clock, constructed across a broad spectrum of tissue and cell types, is relatively uncorrelated with blood cell proportions [11], and is thought to capture cell-intrinsic changes in DNA methylation which might reflect an innate ageing process. Both the Hannum and Horvath epigenetic clocks are strongly correlated (r>0.95) with chronological age [7,8]. However, despite these high overall correlations, there can be substantial differences between epigenetic and chronological age at the individual level, and it is unclear what drives these differences. A greater epigenetic age relative to chronological age is commonly described as ‘epigenetic age acceleration’ (EAA), and implies that a person is biologically older than their years. EAA has been shown to be informative for both current and future health trajectories [9]. Recently, a growing number of studies have used EAA to investigate age-related disorders, and the epigenetic clock is increasingly being recognised as a valuable marker of biological ageing [10,12]. The simplest definition of epigenetic age acceleration is the residual that results from regressing epigenetic age on chronological age. However, it is well known that the abundance of different cell types in the blood changes with age [13,14], and hence two broad categories of EAA measures have been distinguished: those that are independent of age-related changes in blood cell composition, and those that incorporate and are enhanced by blood cell count information [10]. The former group, considered to reflect ‘pure’ epigenetic ageing effects that are not influenced by differences in blood cell counts, are often referred to as ‘intrinsic’ epigenetic age measures. The latter group up-weights the contributions of blood cell counts, thus leveraging known age-related changes to blood cell proportions to capture aspects of immunosenescence; these measures are referred to as ‘extrinsic’ epigenetic age measures. In keeping with previous work, this study focuses on two different epigenetic age measures, based on the Horvath and Hannum epigenetic clocks [7,8], and uses these to derive variations of EAA that are either independent of blood cell counts, or enhanced by changes in blood cell composition. Horvath-based epigenetic age follows the approach by Horvath (2013), and is defined as the predicted value of age based on the DNA methylation levels of the 353 CpG sites identified in his study [8]. Horvath-based epigenetic age acceleration (Horvath-EAA) is the residual term of a multivariate model regressing the Horvath-based epigenetic age estimate on chronological age and estimates of blood cell counts. It is by definition independent of both chronological age and age-related changes in the cellular composition of blood. Hannum-based epigenetic age is based on DNA methylation levels at the 71 CpGs identified by Hannum et al. (2013) [7]. Hannum-based epigenetic age acceleration (Hannum-EAA) is an enhanced version of the Hannum estimate which up-weights the contributions of age-associated blood cells. A weighted average of Hannum-based epigenetic age with blood cells whose abundance is known to change with age is calculated, and Hannum-EAA is then defined to be the residual variation from a univariate model regressing the weighted DNA methylation age estimate on chronological age. Hannum-EAA is independent of chronological age but in addition to cell-intrinsic epigenetic changes it also tracks age-related changes in blood cells. Full details of the calculation of Horvath-EAA and Hannum-EAA are given in S1 Text. Horvath-EAA, described in previous publications as ‘intrinsic’ epigenetic age acceleration (IEAA), can be interpreted as a measure of cell-intrinsic ageing that exhibits preservation across multiple tissues, appears unrelated to lifestyle factors, and probably indicates a fundamental cell ageing process that is largely conserved across cell types [8,10]. In contrast, Hannum-EAA, referred to in previous studies as ‘extrinsic’ epigenetic age acceleration (EEAA), can be considered a biomarker of immune system ageing, explicitly incorporating aspects of immune system decline such as age-related changes in blood cell counts, correlating with lifestyle and health-span related characteristics, and thus yielding a stronger predictor of all-cause mortality [10,15]. It should be noted that as both the Horvath and Hannum epigenetic clocks correlate well with age, in a population with a wide age range they are guaranteed to correlate with each other. However, Horvath-based and Hannum-based epigenetic age acceleration estimates, i.e. the degree of divergence of epigenetic age from chronological age, are not guaranteed to be correlated. Previous studies have identified relationships between epigenetic ageing and numerous traits, including several age-related health outcomes, for example Alzheimer’s disease pathology [16], cognitive impairment [16], and age at menopause [17]. Higher EAA has been associated with poorer measures of physical and cognitive fitness [9] and higher risk of all-cause mortality [12]. Many associations are specific to either Horvath-EAA or Hannum-EAA, a discordance that may reflect the differences in the two estimates and supports the theory that they represent different aspects of ageing [15,18,19]. While EAA has been associated with various markers of physical and mental fitness, the mechanisms underlying epigenetic ageing remain largely unknown. There has been little research conducted thus far on genetic contributions to epigenetic age acceleration. However, Lu et al. (2018) recently published results of the first genome-wide association analysis of blood EAA in a sample of 9,907 individuals, identifying five genetic loci associated with Horvath-EAA and three Hannum-EAA-associated loci [20]. This current study, with a sample size of 13,493 individuals, constitutes the largest study of the genetic determinants of DNA methylation-based ageing to date. Single nucleotide polymorphism (SNP)-based and gene-based approaches were used to identify genes and loci associated with Hannum-based and Horvath-based estimates of EAA. Functional mapping and annotation of genetic associations were performed, alongside gene-based and gene-set analyses, in an attempt to elucidate the genes and pathways implicated in differential rates of epigenetic ageing between individuals and shed light on the underlying biological mechanisms. We report novel SNPs and genes associated with epigenetic age acceleration, and highlight differences in the genetic architectures of the Horvath-based and Hannum-based EAA measures.

Discussion This study investigated genetic markers of epigenetic ageing in a sample of 13,493 individuals of European ancestry. We examined genetic determinants of both Horvath-based (adjusted for the composition of age-related blood cells) and Hannum-based (immune system-associated) epigenetic age acceleration, sometimes referred to as ‘intrinsic’ and ‘extrinsic’ epigenetic age acceleration, to gain insight into the regulation of epigenetic ageing. We report several novel findings in addition to replicating a sub-set of previous results. The meta-analysis of Horvath-EAA identified ten independent associated SNPs, doubling the number reported to date, and highlighted 21 genes involved in Horvath-based epigenetic ageing. A single genome-wide significant variant was identified for Hannum-EAA, along with 12 implicated genes. We uncovered limited evidence of functionality within some associated genomic loci, with many SNPs located in regions of open chromatin and a smaller number in regulatory regions. Some loci also contained regions where genetic variation is predicted to be deleterious. It has been hypothesised that in some cases DNA methylation could be a candidate mechanism for mediating genetic effects on ageing-related phenotypes [54]. Intriguingly, four of the ten Horvath-EAA-associated SNPs are mQTL for CpGs used in the Horvath/Hannum epigenetic clocks. A possible interpretation of this is that the functional mechanism by which these SNPs influence the rate of biological ageing is via altering methylation levels. A number of the genes significantly associated with Horvath-EAA are related to metabolism (NHLRC1, TPMT, KDM1B, and ESYT3), consistent with several studies reporting phenotypic associations between Horvath-based EAA and metabolic syndrome characteristics and supporting the suggestion of a role in tracking metabolic ageing [15,19]. Others are involved in immune system pathways (TRIM59, KPNA4, EDARADD), while several have roles in cellular processes linked to ageing: apoptosis and autophagy (FAIM), ageing and autophagy (TERT), and coordinating vital cell functions (PIK3CB). PIK3CB plays a role in the signal transduction of insulin and insulin-like pathways [55], and genetic variants at this locus have been related to insulin-like growth factor levels in plasma, and human longevity [56]. Genes associated with Hannum-based EAA, often referred to as immune system ageing, include several involved in innate immune system pathways (e.g. TRIM46 and MUC1) or with metabolic and immune system functions (MANBA, UBE2D3). Other associated genes of interest include those with roles relating to ageing and longevity: MTRNR2L7 is a neuroprotective and anti-apoptotic factor, and CISD2 regulates autophagy and is a fundamentally important regulator of lifespan. Mouse studies indicate that CISD2 ameliorates age-associated degeneration of skin, skeletal muscle, and neurons, protects mitochondria from age-related damage and functional decline, and attenuates age-associated reduction in energy metabolism [57], while CISD2 deficiency leads to a number of phenotypic features suggestive of premature ageing [58]. Our LD score regression analysis replicated the positive genetic correlations with central adiposity reported by Lu et al. (2018) at nominal significance levels, supporting the suggestion that observed phenotypic associations [15,19] may result in part from a shared genetic aetiology. We did not, however, replicate previously reported correlations between Horvath-EAA and metabolic disease-related traits or diabetes, and found these traits to be correlated with Hannum-EAA at only nominal significance levels in our larger sample [20]. We also found no correlation between epigenetic age acceleration and age at menopause. Nominally significant genetic correlations between Hannum-based, but not Horvath-based, epigenetic age acceleration, and lifestyle factors such as smoking behaviour and education level, provide some evidence for a genetic basis underlying the phenotypic results we reported previously [19], and provide tentative support for the hypothesis that Hannum-based epigenetic ageing is relatively sensitive to changes in environment and lifestyle. Father’s age at death, a rough proxy for lifespan [59], was nominally significantly correlated with both EAA measures, and parents’ age at death was additionally correlated with Hannum-EAA, consistent with a body of work demonstrating robustly that EAA predicts life span [10,12]. Aside from these, genetic correlations with age-related traits were surprisingly few: it is possible that this could reflect an overly conservative correction for the multiple tests carried out, or low statistical power, rather than a genuine lack of correlations (S4 Table). While the mean χ2 values (1.059 and 1.054 for Horvath-EAA and Hannum-EAA respectively) indicate a sufficient level of polygenicity within the dataset for use with LD score regression, the heritability Z-scores for Horvath-EAA and Hannum-EAA are 3.69 and 4.91 respectively. The recommendation is that genetic correlation analysis should be restricted to GWAS with a heritability Z-score of 4 or more, on the grounds of interpretability and power [53], so the Horvath-based results particularly should be interpreted with caution. This study of epigenetic age acceleration benefits from having a large sample size. Increasing GWAS sample size increases the power to detect associated loci, and is often achieved, as in this case, by combining smaller studies in a meta-analysis. Meta-analytic GWAS are, however, sometimes hampered by differences in how a trait is measured between individual studies. In this instance, use of the online calculator to calculate the EAA measures and using the same algorithm and output columns for each study, mitigates this. The current study comprises only individuals of European ancestry, which confers a further advantage as epigenetic ageing rates have been shown to differ between ethnicities [60]. Despite the large sample overlap, some results of this study differ from those reported by Lu et al. (2018). One reason for this could be that only European-ancestry individuals were included in this analysis whereas the Lu study reports results from a mixed ancestry sample. Another likely contributing factor is the age ranges involved: the GS cohort, not included in Lu’s analysis but which makes up 38% of the total sample in the current study, has a mean age of 48.5 years, 14.4 years younger than the mean age of the remaining cohorts. Given that epigenetic age changes over the life course, although not necessarily in parallel with chronological age, this could help explain the discrepancies between the studies. There are a number of limitations which should be considered when interpreting the results of this study. This is the largest meta-analysis of genetic determinants of epigenetic age acceleration to date, however, while large for these phenotypes, the size of the sample studies here is still small in terms of genome-wide analysis of polygenic traits. As only European-ancestry individuals were included, the results are not generalisable to other ethnicities. The MAGMA gene-based analysis identified a number of biologically plausible associated genes for both EAA measures; however, while many of these genes are located in the same genomic regions as the significantly associated SNPs, this should not be taken as evidence that the SNP association is effected through the gene. Identifying effector transcripts for GWAS variants is a difficult and as yet unresolved problem, and our knowledge of how these genes may affect the activity of the SNPs is limited. In addition, MAGMA does not take into account information from methylation QTL to help identify relevant genes; future work should place more emphasis on the role of mQTL. The lack of significant genetic correlations between EAA and age-related traits may reflect low statistical power (the heritability Z-score of 3.69 for Horvath-EAA falls below the recommended lower threshold of 4 for genetic correlation analysis) or overly stringent correction for multiple comparisons (FDR correction was applied over the 218 tested traits, however not all of these were independent) rather than a true absence of shared genetic aetiology. Finally, while we have identified a number of SNPs and genes significantly associated with EAA, including genes already known to be related to ageing, the analyses presented here fall short of providing a mechanistic explanation for how these variants and genes act to influence biological age. This study should be considered as 'discovery' research, with a comprehensive investigation of the functional and biological mechanisms behind the SNP and gene associations being a direction for future work. Horvath-based and Hannum-based epigenetic age acceleration are thought to represent different aspects of ageing. Hannum-EAA has been described as a biomarker of immune system ageing, and has been found to be associated with a wide range of traits [15,19], indicating a sensitivity to variations in environment and lifestyle. By contrast, Horvath-EAA is considered to be a fundamental, intrinsic cellular ageing process, largely unrelated to lifestyle factors, although associations with a range of metabolic syndrome characteristics suggest a role in tracking metabolic ageing processes. Our results reflect this to a large degree, with more nominally significant genetic correlations found with Hannum-EAA than Horvath-EAA, including items relating to education, smoking, intelligence, and various cholesterol measures. Meanwhile the greater number of significant variants, genomic loci, and genes associated with Horvath-EAA are consistent with the hypothesis that this measure of 'cell-intrinsic' ageing is less related to lifestyle and more under genetic control, and thus more likely to remain relatively stable. Despite these differences, however, our results indicate some common features. The significant genetic correlation of 0.57 between the two measures suggests a moderate overlap in the genetic factors influencing the two phenotypes despite the biomarkers being based on almost entirely distinct CpG sets. Both also appear to be influenced by genes associated with metabolic and immune system pathways, although the specific genes involved are different. Conclusions This study provided insight into the genetic determinants of differential biological ageing through the identification of genes and genetic variants associated with epigenetic age acceleration. We doubled the number of SNPs associated with Horvath-EAA reported to date, and report 21 genes significantly associated with this phenotype, including PIK3CB, linked to human longevity. We identified 12 Hannum-EAA-associated genes, one of which, CISD2, has a fundamental role in lifespan control. Our results also highlighted differences in the genetic architecture of the Horvath-based and Hannum-based EAA measures, with no genome-wide significant SNPs or genes common to the two, providing substantial support for the hypothesis that they represent different aspects of ageing. While the genetic information coded by our DNA sequence remains largely fixed throughout the lifetime, the expression of our genes is primarily regulated by epigenetic factors, which change over time. Epigenetic age increases with, but not in parallel with, chronological age; individual differences in the rate of epigenetic ageing potentially explain why trajectories of ageing differ between individuals. Understanding what causes these differences could potentially inform therapeutic interventions to delay the onset of age-related decline and improve ageing outcomes.

Methods Ethics statement Generation Scotland received ethical approval from the NHS Tayside Committee on Medical Research Ethics (REC Reference Number: 05/S1401/89). GS has also been granted Research Tissue Bank status by the Tayside Committee on Medical Research Ethics (REC Reference Number: 10/S1402/20), providing generic ethical approval for a wide range of uses within medical research. All participants provided written informed consent. Details of ethics approval and consent to participate for the cohorts included in the Lu et al. (2018) study can be found in their publication. Generation Scotland cohort We carried out genome-wide association analyses of Horvath-EAA and Hannum-EAA in a subset of individuals (n = 5,100) from the Generation Scotland: Scottish Family Health Study (GS) for whom both genetic and DNA methylation data were available. GS is a family- and population-based cohort recruited via general medical practices across Scotland; the recruitment protocol and sample characteristics are described in detail elsewhere [61,62]. In brief, the full cohort comprises 23,960 individuals aged between 18 and 98 years. Pedigree information was available for all participants, detailed socio-demographic and clinical data were collected, and biological samples were taken for genotyping. DNA methylation and derivation of epigenetic age acceleration variables in GS DNA methylation data were obtained from peripheral blood (n = 5,091) or saliva (n = 10) samples for 5,101 individuals from GS, with quality control checks carried out using standard methods outlined in S1 Text, and described in full elsewhere [19]. After quality control (QC), the dataset comprised beta-values for 860,928 methylation loci. Methylation-based age estimates (DNAm age) and epigenetic age acceleration variables (Horvath-EAA and Hannum-EAA, described in S1 Text) were obtained from the online DNA Methylation Age Calculator (https://dnamage.genetics.ucla.edu/) developed by Horvath [8]. Normalised DNA methylation beta-values were submitted to the calculator, using the 'Advanced Analysis for Blood Data' option, and undergoing further normalisation within the calculator algorithm to make the data comparable to the training data of the epigenetic clock. One individual was flagged by the calculator as having a gender mismatch, and was therefore omitted from downstream analysis, leaving a total of 5,100 individuals for the GWAS of Horvath-EAA and Hannum-EAA in GS. Blood cell abundance measures were also estimated by the online calculator, based on DNA methylation levels, as described previously [63]. Genotyping, imputation, and quality control in GS An overview of biological sample collection, DNA extraction, genotyping, imputation using the Haplotype Research Consortium reference panel (v1.1), and quality control for GS is included in S1 Text; full details have been described previously [64]. A total of 20,032 individuals passed all quality control thresholds. Following the removal of monomorphic or multiallelic variants and SNPs with a low imputation quality or a minor allele frequency below 1%, an imputed dataset with 8,633,288 hard called variants remained to be used in the genome-wide association analysis. GWAS of Horvath-EAA and Hannum-EAA in GS GWAS of Horvath-EAA and Hannum-EAA in GS were conducted using mixed linear model based association (MLMA) analysis [65], implemented in GCTA (Genome-wide Complex Trait Analysis) (v1.25) [66], and adjusting for sex to account for the higher epigenetic age acceleration in men than in women [7,12,60]. In order to account for population stratification, it is common to conduct ancestry-informative principal components analysis on the population in question, and use a number of the top-ranking principal components (PCs) from this analysis as covariates in the GWAS. However, as GS is a family-based sample, we employed a different approach to capture population structure. In place of PCs, two genomic relationship matrices (GRMs) were included in the GWAS, as this method has been shown to account for potential upward biases due to excessive relationships, and thus allows the inclusion of closely and distantly related individuals in genetic analyses [67]. The first GRM included pairwise relationship coefficients for all individuals, while the second had off-diagonal elements <0.05 set to 0; full details of the methods involved and construction of the GRMs is given elsewhere [68]. The results of univariate LD score regression analysis [26] (S4 Table) indicate that the two GRMs adequately accounted for population stratification, so it was not necessary to include ancestry-informative PCs in the GWAS. GWAS meta-analysis of Horvath-EAA and Hannum-EAA We obtained summary statistics from the largest European-ancestry analysis of epigenetic age acceleration to date (n = 8,393, Lu et al., 2018, summary information in S19 Table), and meta-analysed these with GS (details above). We chose not to include available data from non-European samples, despite the advantages of increased sample size, as different ethnicities have been shown to have different epigenetic ageing rates [60]. Association summary statistics from the GWAS of the two EAA phenotypes in GS and the Lu et al. study were meta-analysed using the inverse variance-weighted approach, which weights effect sizes by sampling distribution. This analysis was implemented in METAL [69], conditional on each variant being available in both samples. As SNPs which co-located with CpGs from the Hannum- or Horvath-based DNAm age predictors had already been excluded from Lu et al.'s analysis, it was not necessary to repeat this step. This resulted in 5,932,107 genetic variants for Horvath-EAA and 5,931,171 variants for Hannum-EAA, in a meta-analysis dataset containing 13,493 participants. The meta-analytic summary statistics produced by METAL were uploaded to FUMA (fuma.ctglab.nl) [27], which identified index SNPs and genomic risk loci related to epigenetic age acceleration. FUMA selects independent significant SNPs based on their having a genome-wide significant P-value (P<5x10-8) and being independent from each other (r2<0.6 by default) within a 250kb window. The European subset of the 1000 Genomes phase 3 reference panel [70] was used to map LD. SNPs in LD with these independent significant SNPs (r2≥0.6) within a 250kb window, and which have a minor allele frequency (MAF)>1% within the 1000 Genomes reference panel, were included for further annotation and used for gene prioritization. A subset of the independent significant SNPs, those in LD with each other at r2<0.1 within a 250kb window, were identified as lead SNPs. Genomic risk loci, including all independent signals that were physically close or overlapping in a single locus, were identified by merging any lead SNPs that were closer than 250kb apart (meaning that a genomic risk locus could contain multiple lead SNPs, with each locus represented by the lead SNP with the lowest P-value in that locus). Conditional analysis was implemented using GCTA software [66] to ascertain whether associated genetic loci harboured more than one independent causal variant, conditioning on the lead SNP at the locus and using GS as the reference panel for inferring the LD pattern. SNPs which remained significantly associated (P<5x10-8) with the phenotype after conditioning on the lead SNP were considered to be further independent associated variants. Manhattan plots and quantile-quantile plots were generated in R version 3.2.3 using the 'qqman' package, and regional SNP association results were visualised with LocusZoom [21]. SNPs which surpassed the threshold for genome-wide significance in our meta-analyses were checked against the NHGRI-EBI catalog of published GWAS [71,72] (www.ebi.ac.uk/gwas/) to determine whether they had previously been observed in association analysis. Methylation quantitative trait loci To ascertain whether the genome-wide significant associations from the Horvath-EAA and Hannum-EAA GWAS are confounded by methylation quantitative trait loci, we checked for SNP-CpG pairings in the mQTL database, a catalogue of the genetic influences on DNA methylation (mQTLdb, [25]). The independent significant SNPs from both GWAS were input to the database, using the MatrixEQTL database setting, which contains all associations below 1x10-7, and assessing all five time points (birth, adolescence, childhood, middle age, and pregnancy). A distance greater than or equal to 1 Mb was considered to be trans. Heritability analysis To estimate the SNP-based heritability for Horvath-EAA and Hannum-EAA, univariate Linkage Disequilibrium score regression [26] was applied to the GWAS summary statistics for both measures. This method also provides metrics to evaluate the proportion of inflation in the test statistics caused by confounding biases such as residual population stratification, relative to genuine polygenicity. We used pre-computed LD scores, estimated from the European-ancestry samples in the 1000 Genomes Project [73]. SNP functional annotation Functional annotation, using all SNPs located within the genomic risk loci which were nominally significant (P<0.05), had a MAF≥1%, and were in LD of r2≥0.6, was carried out in FUMA v1.3.0 [27]. In order to investigate the functional consequences of variation at these SNPs, they were first matched (based on chromosome, base pair position, reference and non-reference alleles) to a database containing functional annotations from a number of repositories: ANNOVAR (Annotate Variation) categories [74], used to identify a SNP's function and determine its position within the genome.

Combined Annotation Dependent Depletion (CADD) scores [28], a measure of the deleteriousness of genetic variation at a SNP to protein structure and function, with higher scores indicating more deleterious variants.

RegulomeDB (RDB) scores [29], based on data from eQTL as well as chromatin marks, with lower scores given to variants with the greatest evidence for having regulatory function.

Chromatin states [75–77], indicating the level of accessibility of genomic regions, described on a 15 point scale, where lower chromatin scores indicate a greater level of accessibility to the genome at that site; generally, between 1 and 7 is considered an open chromatin state. Gene-based analysis Gene-based analysis was performed for each phenotype using the results of our association analysis, using default settings in MAGMA v1.6 [39], integrated within the FUMA web application. Summary statistics of SNPs located within protein-coding genes were aggregated to assess the simultaneous effect of all SNPs in the gene on the phenotype. The European panel of the 1000 Genomes phase 3 data was used as a reference panel to account for LD [70]. Genetic variants were assigned to protein-coding genes obtained from Ensembl build 85, resulting in 17,798 genes being analysed. After Bonferroni correction (α = 0.05/17,798), a threshold for genome-wide significant genes was defined at P<2.809×10−6. eQTL and colocalisation analysis The independent genome-wide significant SNPs identified in the meta-analyses of Horvath-EAA and Hannum-EAA were assessed to determine whether they were potential eQTL, by mapping SNPs to genes if allelic variation at the SNP is associated with expression levels of the gene. This analysis was carried out using data from the Genotype Tissue Expression portal (GTEx) v7 [30], integrated within the FUMA web application. GTEx uses gene expression data from 48 different types of human tissue, linked to genotype data to provide information on eQTL. Since Horvath-EAA is derived from the pan-tissue Horvath epigenetic clock, eQTL analysis of the ten Horvath-EAA-associated SNPs used all the available tissue types in GTEx. Analysis for the Hannum-EAA SNP, however, was restricted to only the blood tissue types, as the Hannum epigenetic clock is specific to blood samples. eQTL mapping carried out within FUMA maps SNPs to genes which likely affect expression of those genes within 1Mb, i.e. cis-eQTL. Although FUMA contains all SNP-gene pairs of cis-eQTL, including non-significant associations, we limited our analysis to significant SNP-gene pairs, with a false discovery rate (FDR) ≤ 0.05 used as the cut-off to define significant eQTL associations. To further investigate the potential regulatory functions of the identified SNPs, we carried out colocalisation analysis to determine whether the SNPs are mediated through gene expression. We integrated our GWAS results with cis-eQTL data from the eQTLGen Consortium (https://www.eqtlgen.org/) [38], using a Bayesian method, 'coloc' [37], which evaluates whether the GWAS and eQTL associations best fit a model in which the same SNP is associated with both EAA and cis gene expression. This method, implemented in the 'coloc' package in R, tests pairwise colocalisation of SNPs in significant genomic regions in the GWAS with eQTLs, and generates posterior probabilities for each locus by weighing the evidence for competing hypotheses of no causal variants for either trait, causal variants for one trait only, independent causal variants influencing the two traits, or a shared causal SNP. We extracted summary statistics from the Horvath-EAA/Hannum-EAA meta-analytic GWAS results for all SNPs in a +/- 200 kb region around each genome-wide significant SNP, and extracted equivalent summary data for the same region in the eQTL analysis. Using the default prior probabilities in ‘coloc’, pairwise colocalisation was then tested between each GWAS-eQTL pair, with a posterior probability of ≥0.95 considered to be strong evidence in favour of a given hypothesis. Gene-set analysis To assess whether the Horvath-EAA and Hannum-EAA GWAS meta-analysis results are enriched for various gene-sets and provide insight into the involvement of specific biological pathways in the genetic aetiology of the phenotype, the gene-based analysis results were used to perform competitive gene-set and pathway analysis using default parameters in MAGMA v1.6, integrated within FUMA. The reference genome was 1000 genomes phase 3. This analysis used gene annotation files from the Molecular Signatures Database v5.2 for "Curated gene sets", covering chemical and genetic perturbations, and Canonical pathways, and "GO terms", covering three ontologies: biological process, cellular components, and molecular function. A total of 10,894 gene-sets were examined for enrichment in Horvath-EAA and Hannum-EAA, with a Bonferroni correction applied to control for multiple testing. Thus genome-wide significance was defined at P = 0.05/10,894 = 4.59x10-6. Genetic correlations Cross trait LD score regression [52] was used to calculate genetic correlations between Horvath-based and Hannum-based EAA in our meta-analysis, and then between Horvath-EAA/Hannum-EAA and 218 other behavioural and disease-related traits for which GWAS summary data were available through LD Hub [53]; traits derived from non-Caucasian or mixed ethnicity samples were removed prior to analysis. This method exploits the correlational structure of SNPs across the genome and uses test statistics provided from GWAS summary estimates to calculate the genetic correlations between traits [52]. We checked whether our meta-analysis datasets had sufficient evidence of a polygenic signal, indicated by a heritability Z-score of >4 and a mean χ2 statistic of >1.02 [52]. By default, a MAF filter of >1% was applied, and indels and strand ambiguous SNPs were removed. We filtered to HapMap3 SNPs, and SNPs whose alleles did not match those in the 1000 Genomes European reference sample were removed. LD scores and weights for use with European populations were downloaded from (https://github.com/bulik/ldsc). We did not constrain the intercepts in our analysis, as we could not quantify the exact amount of sample overlap between cohorts. False discovery rate correction was applied across the 218 traits to correct for multiple testing [78].

Acknowledgments We are grateful to the families and individuals who took part in all the cohort studies included in this meta-analysis: the Framington Heart Study, TwinsUK, Women's Health Initiate, European Prospective Investigation into Cancer–Norfolk, Baltimore Longitudinal Study of Aging, Invecchiare in Chianti, aging in the Chianti Area Study, Brisbane Systems Genetics Study, Lothian Birth Cohorts of 1921 and 1936, and Generation Scotland. We further acknowledge all those involved in participant recruitment, data collection, sample processing, and quality control procedures, including project managers, interviewers, clinical staff, laboratory technicians, clerical workers, research scientists, and statisticians.