Many analytical strategies are available for developing lifespan predictors from DNAm data. The reported single stage approach involves the direct regression of time-to-death (due to all-cause mortality) on DNAm levels. By contrast, the current study employed a novel two-stage procedure: In stage 1, we defined DNAm-based surrogate biomarkers of smoking pack-years and a selection of plasma proteins that have previously been associated with mortality or morbidity. In stage 2, we regressed time-to-death on these DNAm-based surrogate biomarkers. The resulting mortality risk estimate of the regression model is then linearly transformed into an age estimate (in units of years). We coin this DNAm-based biomarker of mortality "DNAm GrimAge" because high values are grim news, with regards to mortality/morbidity risk. Our comprehensive studies demonstrate that DNAm GrimAge stands out when it comes to associations with age-related conditions, clinical biomarkers, and computed tomography data.

DNAm levels have been used to build accurate composite biomarkers of chronological age [ 1 – 4 ]. DNAm-based age (epigenetic age) estimators, include the pan tissue epigenetic clock by Horvath 2013 [ 1 ], based on 353 CpGs, and an estimator developed by Hannum 2013 [ 2 ], based on 71 CpGs in leukocytes. These estimators predict lifespan after adjusting for chronological age and other risk factors [ 5 – 9 ]. Moreover, they are also associated with a large host of age-related conditions [ 10 – 20 ]. Recently, DNAm-based biomarkers for lifespan (time-to-death due to all-cause mortality) have been developed [ 21 , 22 ]. For example, Zhang et al (2017) combined mortality associated CpGs [ 21 ] into an overall mortality risk score, while Levine et al (2018) developed a lifespan predictor, DNAm PhenoAge, by regressing a phenotypic measure of mortality risk on CpGs [ 22 ].

Results

Overview of the two-stage approach for defining DNAm GrimAge We constructed the DNAm GrimAge in two-stages. First, we defined surrogate DNAm biomarkers of physiological risk factors and stress factors. These include the following plasma proteins: adrenomedullin, C-reactive protein, plasminogen activation inhibitor 1 (PAI-1), and growth differentiation factor 15 (GDF15) [23,24]. In addition, given that smoking is a significant risk factor of mortality and morbidity, we also used DNAm-based estimator of smoking pack-years. Second, we combined these biomarkers into a single composite biomarker of lifespan, DNAm GrimAge, which is expressed in units of years. We then performed a large-scale meta-analysis (involving more than 7000 Illumina array measurements), showing that DNAm GrimAge is a better predictor of lifespan than currently available DNAm-based predictors. Our studies reveal a surprising finding; which is that in some instances, the DNAm-based surrogate biomarkers (e.g. for smoking pack-years) is a better predictors of mortality than the actual observed (self-reported) biomarker. We also correlated DNAm GrimAge with lifestyle factors and a host of age-related conditions, e.g. we demonstrate that these DNAm-based biomarkers predict time to cardiovascular disease. Finally, we show that DNAm GrimAge is also associated with age-related changes in blood cell composition and leukocyte telomere length.

Training and test data from the Framingham Heart Study We began by correlating the levels of 88 plasma protein variables (measured using immunoassays) with DNAm array data generated from the same blood samples of n=2,356 individuals from the Framingham heart study (FHS) Offspring Cohort [25] (Supplementary Note 1). We divided the FHS data randomly into a training set (70% of the FHS pedigrees, N= 1731 individuals from 622 pedigrees) and a test data set (30% pedigrees, N=625 individuals from 266 pedigrees, Supplementary Table 1). The mean age of individuals donating DNA for the training set was 66 years, while that of individuals in the test dataset was 67. These participants had similar demographic profiles, smoking history, and number of years’ follow-up as those in the training set (Supplementary Table 1).

Stage 1: DNAm-based surrogate biomarkers of plasma proteins and smoking pack-years We used the training data to define DNAm-based surrogate markers of 88 plasma protein variables and smoking pack-years. We restricted the analysis to CpGs that are present on both the Illumina Infinium 450K array and the new Illumina EPIC methylation array in order to ensure future compatibility. Each of the 88 plasma protein variables (dependent variable) was regressed on chronological age, sex, and the CpGs levels in the training data using an elastic net regression model [26], which automatically selected a subset of CpGs (typically fewer than 200 CpGs) whose linear combination best predicted the corresponding plasma level in the training data (Methods). For example, the DNAm levels of 137 CpGs and 211 CpGs allowed us to estimate the plasma levels of GDF15 and PAI-1, respectively. The predicted DNAm values of GDF15 and PAI-1 can then be used as surrogate markers for the measured plasma levels. In general, we denote DNAm-based surrogate markers of plasma proteins and smoking pack-years by adding the prefix "DNAm" to the respective variable name, e.g. DNAm pack-years (Fig. 1 and Supplementary Table 2). Figure 1. Flowchart for developing DNAm GrimAge. Surrogate DNAm-based biomarkers for smoking pack-years and plasma protein levels were defined and validated using training and test data from the Framingham Heart study (stage 1). Only 12 out of 88 plasma proteins exhibited a correlation r >0.35 with their respective DNAm-based surrogate marker in the test data. In stage 2, time-to-death (due to all-cause mortality) was regressed on chronological age, sex, and DNAm-based biomarkers of smoking pack-years and the 12 above mentioned plasma protein levels. The elastic net regression model automatically selected the following covariates: chronological age (Age), sex (Female), and DNAm based surrogates for smoking pack-years (DNAm PACKYRS), adrenomedullin levels (DNAm ADM), beta-2 microglobulin (DNAm B2M), cystatin C (DNAm Cystatin C), growth differentiation factor 15 (DNAm GDF-15), leptin (DNAm Leptin), plasminogen activation inhibitor 1 (DNAm PAI-1), tissue inhibitor metalloproteinase 1 (DNAm TIMP-1). The linear combination of the covariate values XTβ was linearly transformed to be in units of years. Technically speaking, DNAm GrimAge is a mortality risk estimator. Metaphorically speaking, it estimates biological age.

Not all of the available 88 plasma protein levels were successfully imputed based on DNAm data. Instead, only 12 of the 88 plasma proteins exhibited a moderately high correlation coefficient (r>0.35) between their measured levels and their respective DNAm-based surrogate marker in the test data set (Table 1). We focused on these 12 DNAm surrogate biomarkers in stage 2. Additionally, we constructed a DNAm-based surrogate of self-reported smoking pack-years, DNAm pack-years, based on a linear combination of 172 CpGs. Table 1. Reproducibility and age correlations of DNAm based surrogate biomarkers. Correlation ( r ) Training

(N=1731) Test

(N=625) Observed

biomarker Age Observed

biomarker Age DNAm based surrogate adrenomedullin 0.65 0.63 0.38 0.64 beta-2-microglobulin 0.62 0.83 0.43 0.85 CD56 0.86 0.17 0.36 0.17 ceruloplasmin 0.56 0.04 0.49 -0.02 cystatin-C 0.58 0.81 0.39 0.83 EGF fibulin-like ECM protein1 0.59 0.72 0.41 0.87 growth differentiation factor 15 0.74 0.71 0.53 0.81 leptin 0.68 0.06 0.35 0.05 myoglobin 0.50 -0.04 0.38 0.03 plasminogen activator inhibitor 1 0.69 0.19 0.36 0.16 serum paraoxonase/arylesterase 1 0.57 -0.22 0.51 -0.22 tissue Inhibitor Metalloproteinases 1 0.43 0.92 0.35 0.90 smoking pack-years 0.79 0.17 0.66 0.13 The table reports the correlation coefficients between the observed marker (i.e. observed plasma protein level or self-reported smoking pack-years) and its respective DNAm-based surrogate marker in 1) the FHS training data and 2) the FHS test data. Each of the DNA-based surrogate biomarkers (rows) leads to a correlation r > 0.35 in both training and test datasets (columns 2 and 4). DNAm-based pack-years is highly correlated with the self-report pack-years in both training and test datasets (r ≥ 0.66). The table also reports the correlation coefficients between the DNAm-based surrogate biomarkers (rows) and chronological age in the FHS training and test data (columns 3 and 5).

Stage 2: Constructing a composite biomarker of lifespan based on surrogate biomarkers In stage 2, we developed a predictor of mortality by regressing time-to-death due to all-cause mortality (dependent variable) on the following covariates: the DNAm-based estimator of smoking pack-years, chronological age at the time of the blood draw, sex, and the 12 DNAm-based surrogate biomarkers of plasma protein levels. The elastic net Cox regression model automatically selected the following covariates: DNAm pack-years, age, sex, and the following 7 DNAm-based surrogate markers of plasma proteins: adrenomedullin (ADM), beta-2-microglobulim (B2M), cystatin C (Cystatin C), GDF-15, leptin (Leptin), PAI-1, and tissue inhibitor metalloproteinases 1 (TIMP-1), (Supplementary Table 2). DNAm-based biomarkers for smoking pack-years and the 7 plasma proteins are based on fewer than 200 CpGs each, totaling 1,030 unique CpGs (Supplementary Table 2). Details on the plasma proteins can be found in Supplementary Note 2. The linear combination of covariates resulting from the elastic net Cox regression model can be interpreted as an estimate of the logarithm of the hazard ratio of mortality. We linearly transformed this parameter into an age estimate, i.e., DNAm GrimAge, by performing a linear transformation whose slope and intercept terms were chosen by forcing the mean and variance of DNAm GrimAge to match that of chronological age in the training data (Methods, Fig. 1). In independent test data, DNAm GrimAge is calculated without estimating any parameter because the numeric values of all parameters were chosen in the training data. Following the terminology from previous articles on DNAm-based biomarkers of aging, we defined a novel measure of epigenetic age acceleration, AgeAccelGrim, which, by definition, is not correlated (r=0) with chronological age. Toward this end, we regressed DNAm GrimAge on chronological age using a linear regression model and defined AgeAccelGrim as the corresponding raw residual (i.e. the difference between the observed value of DNAm GrimAge minus its expected value). Thus, a positive (or negative) value of AgeAccelGrim indicates that the DNAm GrimAge is higher (or lower) than expected based on chronological age. Unless indicated otherwise, we used AgeAccelGrim (rather than DNAm GrimAge) in association tests of age-related conditions because age was a confounder in these analyses. For the same reason, we also used age-adjusted versions of our DNA-based surrogate markers (for smoking pack-years and the seven plasma protein levels). In general, all association tests were adjusted for chronological age and, when required, other confounders as well (such as sex, Methods).

Pairwise correlations between DNAm GrimAge and surrogate biomarkers Using the test data from the FHS, we calculated pairwise correlations between DNAm GrimAge and its underlying variables (Fig. 2 and Supplementary Table 2). DNAm GrimAge is highly correlated with DNAm TIMP-1 (r=0.90) and chronological age (r=0.82). An estimate of excess mortality risk (called mortality residual mortality.res) exhibits higher positive correlations with both DNAm GrimAge and DNAm TIMP-1 (r ~ 0.40) than with chronological age (r ~ 0.35, Fig. 2), in keeping with our later finding that these DNAm biomarkers are better predictors of lifespan than chronological age. With the exception of DNAm Leptin, all of the DNAm-based biomarkers exhibited positive correlations with the measure of excess mortality risk (0.41 ≥ r ≥ 0.16, Fig. 2). With the exception of DNAm Leptin, all DNAm based surrogate biomarkers exhibited moderate to strong pairwise correlations with each other. DNAm Leptin is elevated in females (Supplementary Fig. 1A, B) consistent with what has been reported in the literature [27,28]. After stratifying by sex, we find that plasma leptin levels increase weakly with age (r=0.18 and P=2.1E-3 in males; r=0.19, P=4.8E-4 in females, Supplementary Fig. 1E, F). Figure 2. Heat map of pairwise correlations of DNAm based biomarkers. The heat map color-codes the pairwise Pearson correlations of select variables (surrounding the definition of DNAm GrimAge) in the test data from the Framingham Heart Study (N=625). DNAm GrimAge is defined as a linear combination of chronological age (Age), sex (Female takes on the value 1 for females and 0 otherwise), and eight DNAm-based surrogate markers for smoking pack-years (DNAm PACKYRS), adrenomedullin levels (DNAm ADM), beta-2 microglobulin (DNAm B2M), cystatin C (DNAm Cystatin C), growth differentiation factor 15 (DNAm GDF-15), leptin (DNAm Leptin), plasminogen activation inhibitor 1 (DNAm PAI-1), issue inhibitor metalloproteinase 1 (DNAm TIMP-1). The figure also includes an estimator of mortality risk, mortality.res, which can be interpreted as a measure of "excess" mortality risk compared to the baseline risk in the test data. Formally, mortality.res is defined as the deviance residual from a Cox regression model for time-to-death due to all-cause mortality. The rows and columns of the Figure are sorted according to a hierarchical clustering tree. The shades of color (blue, white, and red) visualize correlation values from -1 to 1. Each square reports a Pearson correlation coefficient.



Predicting time-to-death in validation data To evaluate whether our novel DNAm-based biomarkers are better predictors of lifespan than chronological age, we analyzed N=7,375 Illumina methylation arrays generated from blood samples of 6,935 individuals comprising 3 ethnic/racial groups: 50% European ancestry (Caucasians), 40% African Americans, and 10% Hispanic ancestry (Table 2, Methods, Supplementary Note 1). The data came from different cohort studies: test data from the FHS, BA23 and EMPC study from the Women’s Health Initiative (WHI), the InCHIANTI cohort study, and African Americans from the Jackson Heart Study (JHS). We stratified each cohort by race/ethnicity (resulting in 9 strata) to avoid confounding and to ascertain whether the mortality predictors apply to each group separately. Table 2. Overview of the cohorts used in the validation analysis. Smoking status Study N Female Age Never Former Current Pack-years Years of

Follow-up FHS*

test 625 53% 66.9±8.64 [61,73] 37% 52% 10% 14.7±19.91 [0,23] 7.7±1.78 [7.3,8.8] WHI BA23 2107 100% 65.3±7.1 [60,70.9] 52% 36% 10% 9.5±18.55 [0,12.5] 16.9±4.63 [15.8,19.9] WHI EMPC 1972 100% 63.3±7.03 [57.9,68.7] 52% 38% 9% 9±17.27 [0,12.5] 18±4.02 [17.9,20.1] JHS 1747 63% 56.2±12.31 [46.5,65.4] 65% 21% 14% NA 11.7±2.55 [11.2,13.1] InChianti** 924

(484) 54% 67±16.64 [60,78] 57% 29% 14% 10.3±17.33 [0,16.8] 5.4±4.84 [0.1,9.3] NA=not available. Quantitative variables are presented in the format of mean ±SD [25th, 75th]. *The distribution of age is based on exam 8. **The statistics are based on the number of 924 observations across 484 individuals. The table summarizes the characteristics of 6,935 individuals (corresponding to 7,375 Illumina arrays) from five independent cohorts that were used in our validation analysis. For example, up to two longitudinal measurements were available for each of 484 individuals in the InChianti cohort. The mean chronological age at the time of the blood draw was 63.0 years. The mean follow-up time (used for assessing time-to-death due to all-cause mortality) was 13.7 years. Since chronological age is one of the component variables underlying DNAmGrimAge, it is not surprising that the latter is highly correlated with age in each of the study cohorts (r≥ 0.79,Supplementary Fig. 2). While each (age-adjusted) component variable underlying DNAm GrimAge is a significant predictor of lifespan (Fig. 3), DNAm pack-years (meta-analysis P=1.7E-47) and DNAm PAI-1(P=5.4E-28) exhibit the most significant meta-analysis P-values. The fixed effects meta-analysis P-values reveal that AgeAccelGrim stands out when it comes to lifespan prediction (meta-analysis P=2.0E-75, Fig. 3A). The same applies when the analysis is restricted to never-smokers (Supplementary Fig. 3) or to former/current smokers (Supplementary Fig. 4). AgeAccelGrim remains a highly significant predictor of lifespan after restricting the analysis to never-smokers (N=3,988, meta analysis P=1.1E-16, Supplementary Fig. 9) or to former/current smokers (P=3.5E-33, Supplementary Fig. 3A). Figure 3. Meta analysis forest plots for predicting time-to-death due to all-cause mortality. Each panel reports a meta-analysis forest plot for combining hazard ratios predicting time-to-death based on a DNAm-based biomarker (reported in the figure heading) across different strata formed by racial group within cohort. (A) Results for AgeAccelGrim. Each row reports a hazard ratio (for time-to-death) and a 95% confidence interval resulting from a Cox regression model in each of 9 strata (defined by cohort and racial groups). Results for (age-adjusted) DNAm-based surrogate markers of (B) adrenomedullin (ADM), (C) beta-2 microglobulin (B2M), (D) cystatin C (Cystatin C), (E) growth differentiation factor 15 (GDF-15), (F) leptin, (G) plasminogen activation inhibitor 1 (PAI-1), (H) tissue inhibitor metalloproteinase 1 (TIMP-1) and (I) smoking pack-years (PACKYRS). The sub-title of each panel reports the meta-analysis p-value and a p-value for a test of heterogeneity Cochran Q test (Het.). (A) Each hazard ratio (HR) corresponds to a one-year increase in AgeAccelGrim. (B-H) Each hazard ratio corresponds to an increase in one-standard deviation. (I) Hazard ratios correspond to a 1 year increase in pack-years. The most significant meta-analysis P value (here AgeAccelGrim) is marked in red. A non-significant Cochran Q test p-value is desirable because it indicates that the hazard ratios do not differ significantly across the strata. For example, the hazard ratios associated with AgeAccelGrim exhibit insignificant heterogeneity across the strata ( C o c h r a n Q t e s t P I 2 =0.16).



Instances in which DNAm-based surrogates outperform observed biomarkers The DNAm-based surrogate biomarker for smoking pack-years has two surprising properties. First, it predicts lifespan in never-smokers (P=1.6E-6, Supplementary Fig. 3I). Second, the surrogate marker is a more significant predictor of lifespan than self-reported pack-years: P=8.5E-5 for DNAm marker versus P=2.1E-3 for observed pack-years in in the FHS test data; similarly, P=5.3E-4 versus 0.18 in the InChianti Study (Supplementary Table 3). The superior predictive performance of DNAm based surrogate biomarkers vis-à-vis their observed/ counter parts also applies to PAI-1 plasma levels (P=8.7E-4 for the DNAm marker versus P=0.074 for the observed levels), TIMP-1 (P=3.8E-4 for the DNAm marker versus P=0.017), and to a lesser extent to cystatin C (P=0.019 for the DNAm estimator versus P=0.054 for the observed level, Supplementary Table 4).

Mortality prediction based on observed plasma protein levels The AgeAccelGrim is a composite biomarker derived from DNAm-based surrogate biomarkers of plasma protein levels and smoking pack-years. This begs the question whether a predictor of lifespan based directly on observed plasma protein levels and self-reported smoking pack-years, would outperform its DNAm-based analog? Analogous to our construction of DNAm GrimAge, we used a Cox regression model to regress time to-death on the observed plasma protein levels and self-reported pack-year in the training data (Methods). The resulting mortality risk estimator (defined as weighted average of the observed biomarkers) was linearly transformed into units of years. The resulting predictor, i.e., observed GrimAge, and its age-adjusted version. i.e., DNAm based AgeAccelGrim, were compared in the FHS, showing similar HRs (observed AgeAccelGrim HR=1.10, P=3.2E-7; DNAm based AgeAccelGrim HR= 1.12, P=8.6E-5, Supplementary Table 5). Overall, this comparison shows that DNAm levels in general and our DNAm-based surrogate biomarkers in particular capture a substantial proportion of the information that is captured by the 7 selected plasma proteins and self-reported smoking pack-years. Since our study focuses on DNAm-based biomarkers, we will only consider DNAm-based biomarkers in the following.

Age-related conditions Our Cox regression analysis of time-to-coronary heart disease (CHD), reveals that AgeAccelGrim is highly predictive of incident CHD (HR=1.07, P=6.2E-24 and P I 2 =0.4, Fig. 4A). As expected, several underlying DNAm-based surrogate biomarkers also individually predict incident CHD; notably the age-adjusted versions of DNAm smoking pack-years (HR=1.02, P=6.4E-14) and DNAm PAI-1 (HR=1.31 per SD, P=3.6E-12). Figure 4. Meta analysis forest plots for predicting time-to-coronary heart disease. Each panel reports a meta-analysis forest plot for combining hazard ratios predicting time to CHD and the DNAm-based biomarker (reported in the figure heading) across different strata formed by racial groups within cohorts. (A) Results for AgeAccelGrim. Each row reports a hazard ratio (for time-to-CHD) and a 95% confidence interval resulting from a Cox regression model in each of 9 strata (defined by cohort and racial groups). Results for (age adjusted) DNAm-based surrogate markers of (B) adrenomedullin (ADM), (C) beta-2 microglobulin (B2M), (D) cystatin C (Cystatin C), (E) growth differentiation factor 15 (GDF-15), (F) leptin, (G) plasminogen activation inhibitor 1 (PAI-1), (H) tissue inhibitor metalloproteinase 1 (TIMP-1) and (I) smoking pack-years (PACKYRS). The sub-title of each panel reports the meta-analysis p-value and a p-value for a test of heterogeneity Cochran Q test (Het.). (A) Each hazard ratio (HR) corresponds to a one-year increase in AgeAccelGrim. (B-H) Each hazard ratio corresponds to an increase in one-standard deviation. (I) Hazard ratios correspond to a one unit increased in DNAm pack-years. The most significant meta-analysis P value (here AgeAccelGrim) is marked in red.

Similarly, time-to-congestive heart failure (CHF) is also associated with AgeAccelGrim (HR=1.10 and P=4.9E-9), age-adjusted DNAm cystatin C (HR=2.02 and P=2.0E-10) and DNAm PAI-1 (HR=1.58 and P=8.9E-10, Supplementary Fig. 5). Cross sectional studies reveal that AgeAccelGrim is associated with hypertension (odds ratio [OR]=1.04 and P= 5.1E-13, Supplementary Fig. 6), type 2 diabetes (OR=1.02 and P=0.01, Supplementary Fig. 7), and physical functioning (Stouffer P=1.7E-8, Supplementary Fig. 8). All of the reported associations are in the expected directions, e.g. higher values of AgeAccelGrim are associated with lower physical functioning levels. In women, early age at menopause is associated with significantly higher values of AgeAccelGrim (P=1.6E-12, Supplementary Fig. 9A) and to a lesser extent with all of the age-adjusted versions of the DNAm based surrogate markers, notably DNA cystatin C (P=2.2E-6) and DNAm GDF-15 (P=1.3E-5, Supplementary Fig. 9).

DNAm plasminogen activation inhibitor 1 AgeAccelGrimAge outperforms (age-adjusted versions of) DNAm smoking pack-years and the 7 DNAm-based surrogate markers of plasma protein levels individually with regards to prediction of time-to-death or time-to-coronary heart disease (Figs. 3 and 4). However, age-adjusted DNAm PAI-1 outperforms AgeAccelGrim for several age-related traits (Supplementary Figs. 5-9), notably the comorbidity index (defined as the total number of age-related conditions) where Stouffer's meta-analysis P value for DNAm PAI-1 (P=7.3E-56) is more significant than that for AgeAccelGrim (P=2.0E-16, Fig. 5). As with AgeAccelGrim, higher levels of age-adjusted DNAm PAI-1 are associated with hypertension status, type 2 diabetes status, time-to-CHD (Fig. 4), time-to-CHF, and early age at menopause (Supplementary Figs. 5-7 and 9), while lower levels are associated with disease free status (Stouffer P=2.9E-11, Supplementary Fig. 10) and better physical functioning (Stouffer P=1.4E-8, Supplementary Fig. 8). Figure 5. Meta-analysis of associations with total number of age-related conditions. Each panel reports a meta-analysis forest plot for combining regression coefficients between the comorbidity index and the DNAm-based biomarker (reported in the figure heading) across different strata, which are formed by racial group within cohort. (A) Meta analysis of the regression slope between AgeAccelGrim and the comorbidity index. Analogous results for (age-adjusted) DNAm based surrogate markers of (B) adrenomedullin (ADM), (C) beta-2 microglobulin (B2M), (D) cystatin C (Cystatin C), (E) growth differentiation factor 15 (GDF-15), (F) leptin, (G) plasminogen activation inhibitor 1 (PAI-1), (H) tissue inhibitor metalloproteinase 1 (TIMP-1) and (I) smoking pack-years (PACKYRS). The individual study results were combined using fixed effect meta-analysis (reported in the panel heading). Cochran Q test for heterogeneity across studies (Het.). The effect sizes correspond to one year of age acceleration in panel A, one pack-year in panel I and one standard deviation in other panels for DNAm proteins. The estimate with the most significant meta P value is marked in red.



Heritability analysis We used pedigree based polygenic models (Methods) to measure heritability estimates of AgeAccelGrim and the individual biomarkers. There is significant heritability for AgeAccelGrim ( h 2 =0.30, P=0.022) and observed AgeAccelGrim ( h 2 =0.37, P=0.006,Supplementary Table 6). Similarly, several of our DNAm-based surrogate biomarkers (PAI1, B2M, ADM, and GDF15) and their observed counterparts are also highly heritable (Supplementary Table 6), e.g. DNAm PAI-1 ( h 2 =0.34 and P=7.1E-3), observed PAI-1 levels ( h 2 =0.51 and P=6.2E-4), DNAm Beta 2 microglobulin levels ( h 2 =0.45 andP=2.4E-3), and observed B2M ( h 2 =0.34 and P=3.3E-3). Overall, these results suggest that many observed and DNAm-based biomarkers are heritable.

AgeAccelGrim versus other epigenetic measures of age acceleration Using the same validation datasets (N=7,375 arrays), we compared DNAm GrimAge with three widely-used DNA-based biomarkers of aging: DNAm age estimator based on different somatic tissues by Horvath (2013) [1], the DNAm age estimator based on leukocytes by Hannum (2013) [2] and the DNAm PhenoAge estimator by Levine (2018) [22]. The respective age-adjusted measures of epigenetic age acceleration will be denoted as AgeAccel (or AgeAccelerationResidual), AgeAccelHannum, and AgeAccelPheno following the notation of previous publications. The four epigenetic measures of age acceleration (including AgeAccelGrim) are in units of year. AgeAccelGrim exhibits moderate positive correlations with each of the three alternative measures of epigenetic age acceleration (0.17 ≤ r ≤ 0.45, Supplementary Fig. 11) with the strongest correlation with AgeAccelPheno. The relatively weak correlation with Horvath’s pan-tissue clock (r=0.17) probably reflects the fact that DNAm GrimAge was developed exclusively with blood methylation data. It is evident that AgeAccelGrim is superior with respect to meta-analysis P-values for prediction of time-to-death: AgeAccelGrim (P=2.0E-75, HR=1.10), AgeAccel (Meta P=8.9E-5, HR=1.02, Supplementary Fig. 12), AgeAccelHannum (Meta P=6.8E-16, HR=1.04), AgeAccelPheno (Meta P=3.5E-36, HR=1.05). The results remain qualitatively the same after restricting the analysis to never-smokers or former/current smokers (Supplementary Figs. 13 and 14). Similarly, AgeAccelGrim stands out when comparing individuals in the top 20% percentile of epigenetic age acceleration to those in the bottom 20% percentile (Stouffer meta-analysis P= 6.4E-38, Supplementary Fig. 15), AgeAccelPheno (P=5.7E-21), AgeAccelHannum (P=1.3E-5), and AgeAccel (P=0.17). When it comes to significant associations with comorbidity index, age-adjusted DNAm PAI-1( P D N A m P A I - 1 =7.3E-56, Fig. 5) outperforms all other DNAm-based biomarkers including AgeAccelGrim ( P A g e A c c e l G r i m =2.0E-16) and AgeAccelPheno ( P A g e A c c e l P h e n o =7.8E-21, Supplementary Fig. 16). AgeAccelGrim is more informative than AgeAccelPheno in predicting time-to-CHD ( P A g e A c c e l G r i m =6.2E-24 and H R A g e A c c e l G r i m =1.07 versus P A g e A c c e l P h e n o = 1.7E-8 and H R A g e A c c e l P h e n o =1.03, Supplementary Fig. 17) even after stratifying the analysis by smoking status (Supplementary Figs. 18 and 19). AgeAccelGrim greatly outperforms the other 3 measures of epigenetic age acceleration including predicting time to (any) cancer (AgeAccelGrim P= 1.3E-12 versus AgeAccelPheno P=2.7E-3, Supplementary Fig. 20) and as related to an inverse association with early age at menopause in women (AgeAccelGrim P=1.6E-12 versus AgeAccel P=2.2E-3, Supplementary Fig. 21). A sensitivity analysis reveals that the latter finding remains qualitatively the same even after removing the InChianti cohort, which exhibited the strongest negative association between epigenetic age acceleration and age at menopause (Supplementary Fig. 22).

Multivariate Cox models adjusting for traditional risk factors The above-mentioned Cox regression models were adjusted for age at blood draw (baseline), batch, pedigree, and intra-subject correlation as needed. We also fit multivariate Cox regression models that included additional covariates assessed at baseline: body mass index, educational level, alcohol intake, smoking pack-years, prior history of diabetes, prior history of cancer, and hypertension status (Methods). Even after adjusting for these known risk factors for morbidity, AgeAccelGrim remained a highly significant predictor of lifespan (P=5.7E-29, Supplementary Fig. 23) and time-to-CHD (P=3.7E-11, Supplementary Fig. 24) and outperformed previously published measures of epigenetic age acceleration.

Stratified analyses We evaluated AgeAccelGrim and underlying DNAm biomarkers in different strata characterized by age (younger/older than 65 years), body mass index (obese versus non-obese), educational attainment, prevalent condition at baseline such as prior history of cancer, type 2 diabetes, or hypertension. In all of these strata, AgeAccelGrim remains a significant predictor of time-to-death (Supplementary Table 7) and time-to-CHD (Supplementary Table 8). Furthermore, AgeAccelGrim outperforms existing DNAm-based biomarkers of aging in all strata except for one (comprised of n=281 individuals with a prior history of cancer). These subgroup analysis results also confirm that epigenetic age acceleration is an independent predictor of earlier mortality even after adjusting for possible confounders and within major subgroups of the population. Additional results applied to age-adjusted DNAm proteins and DNAm pack-years are listed in Supplementary Data 1. With few exceptions, we found that DNAm-based PAI-1, TIMP-1 and pack-years remained highly significant in each stratum.

Exceptionally fast/slow agers The DNAm GrimAge estimate allows an intuitive interpretation as physiological age since it is in units of years. However, if someone is 8 years older than expected, this does not mean that this person has on average a 8 year shorter life expectancy. Rather, one should use the hazard ratio when it comes to assessing mortality risks. It is a statistical co-incidence that the hazard ratio associated with one-year increase in AgeAccelGrim is the same in strata comprised of never-smokers (HR=1.10, Supplementary Fig. 3A), former/current smokers (HR=1.10, Supplementary Fig. 4A), and among all individuals combined (HR=1.10, Fig. 3A). This allows us to evaluate the mortality risks in exceptionally fast and slow agers (according to AgeAccelGrim) irrespective of their smoking status. The top 5th percentile and the 95% percentile of AgeAccelGrim corresponds to -7.5 years and + 8.3 years respectively (Supplementary Table 9). A person in the top 95th percentile of AgeAccelGrim (=8.3 years) faces a hazard of death that is twice that of the average person in their stratum (whose AgeAccelGrim equals 0). Specifically, fast aging status is associated with a hazard ratio of HR=2.2=1.108.3. Conversely, a slow ager in the bottom 5th percentile (-7.5 years) faces a hazard of death that is half that of the average person in their stratum, HR=0.49=1.10-7.5.

DNAm GrimAge versus single stage estimators of mortality risk DNAm GrimAge was built using a novel two-stage approach that critically depended on the development of DNAm-based surrogate biomarkers. To justify the utility of this indirect approach, we compared DNAm GrimAge with several DNAm-based mortality risk predictors that were developed by directly regressing lifespan on DNAm data (referred to as single stage mortality predictors). To this end, we developed a new mortality predictor, DNAm Mortality (in year units) by directly regressing time-to-death (due to all-cause mortality) on CpGs in the FHS training data. DNAm Mortality was calculated as linear combination of 59 CpGs. The direct approach entailed the constructions of DNAm Mortality, an elastic net Cox regression model, and linear transformation of the mortality risk to ensure that the values of DNAm Mortality are in units of years (Methods). In addition, we also evaluated the published mortality predictor by Zhang [21] which, remarkably, is based on only 10 CpGs (Methods). The latter two (single-stage) lifespan predictors were found to correlate highly with each other (r=0.77 in the FHS test data). The novel age-adjusted DNAm Mortality estimator (HR=1.07, P=3.0E-44) and both versions of Zhang's mortality risk estimator (P=4.2E-39, Supplementary Fig. 25) lead to a less significant meta-analysis P-value for lifespan prediction than AgeAccelGrim (P=2.0E-75). It is not meaningful to compare HR estimates (here HR=1.02 and HR=1.10, respectively) because these HR estimates critically depend on the scale/distribution of the respective mortality predictors. To provide a meaningful and scale-independent comparison, we focused on the meta-analysis P-values. AgeAccelGrim also stands out in terms of its meta-analysis P-value for predicting time-to-CHD (AgeAccelGrim P=6.2E-24, AgeAccelMortality P=4.6E-11, AgeAccelZhang P=9.5E-12, Supplementary Fig. 26). It is useful to characterize the different lifespan predictors in terms of their correlation with DNAm pack-years because smoking is a major risk factor. Age-adjusted DNAm pack-years exhibits positive correlations with both DNAm Mortality and Zhang's mortality predictor (r ≥ 0.55). The connection of single stage mortality predictors to smoking can also be observed at the CpG level. DNAm Mortality, Zhang’s mortality predictor, and DNAm pack-years explicitly use CpG cg05575921 (in the AHRR gene on chromosome 5p15.33), which has previously been identified by epigenome-wide association studies of cumulative smoking exposure [21,29]. Overall, these results suggest that the two single-stage lifespan predictors relate more strongly to cumulative smoking exposure than does AgeAccelGrim.

Association with blood cell composition DNAm data allow one to estimate several quantitative measures of blood cell types as described in Methods [30,31]. We previously showed that DNAm biomarkers of aging, which capture age-related changes in blood cell composition, are better predictors of lifespan than those that are independent of blood cell counts [7]. Therefore, we hypothesized that several of our novel DNAm biomarkers would exhibit significant correlations with these imputed measures of blood cell composition. This is indeed the case as can be seen from our large scale meta-analysis across the validation data (Supplementary Fig. 27, Supplementary Data 2). AgeAccelGrim is significantly associated with a decrease in naive CD8 naïve cells (r=-0.22, P=9.2E-62, Supplementary Fig. 27A and Supplementary Data 2), CD4+T cells (r=-0.21, P=1.8E-57), and B cells (r=-0.18, P=9.7E-43) and with an increase in granulocytes/neutrophils (r=0.24, P=1.5E-74) and plasma blasts (r=0.22, P=7.3E-63). While these results demonstrate that AgeAccelGrim is associated with an age-related decline in immune system functioning, our cross sectional analysis does not allow us to dissect cause-and-effect relationships. Age-adjusted DNAm TIMP-1 exhibits the most significant correlations with the measures of blood cell composition (e.g. proportion of granulocytes r=0.36, P=2.7E-172, Supplementary Fig. 27H and Supplementary Data 2) followed by age-adjusted DNAm Cystatin C (proportion of CD4+ T cells counts r=-0.33, P=3.4E-142). Although many of our DNAm biomarkers are correlated with blood cell counts, this does not mean that these measures only capture changes in blood cell composition as can be seen from the following. First, measures of blood cell composition correlate weakly with our age-adjusted DNAm surrogate markers of smoking pack-years (strongest correlation r=-0.14, Supplementary Fig. 27G) and PAI-1 levels (strongest correlation r=0.17, Supplementary Fig. 27I) even though both biomarkers are strongly associated with mortality risk and age-related conditions as shown above. Second, the DNAm surrogate markers remain significant predictors of mortality in multivariate Cox regression models that include blood cell counts as additional covariates as detailed in the following.

Association with leucocyte telomere length Leukocyte telomere length (LTL) has been found to be weakly predictive of mortality and cardiovascular disease. Our meta-analysis reveals a statistically significant but weak negative correlation between LTL and AgeAccelGrim (r= -0.12 and meta P= 3.3E-10, Supplementary Table 10) across data from the FHS, WHI (BA23 sub-study) and JHS (total N =2,702, 27% White and 73% African American). Similarly, LTL exhibits (weak) negative correlations with DNAm based surrogate biomarkers for GDF-15 (r= -0.10, meta P= 3.4E-7), DNAm PAI-1 (r= -0.10, meta P= 5.1E-8) and DNAm smoking pack-years (r= -0.09 and meta P= 2.9E-6).

Functional annotation of sets of CpGs The genomic locations of the 1030 CpGs underlying the DNAm GrimAge estimator were analyzed using the GREAT software tool [32] which assigns biological meaning to a set of genomic locations (here CpGs) by analyzing the annotations of nearby genes. At a false discovery rate of FDR < 0.05 we found 361 gene sets from GO, KEGG, PANTHER. Among those, 28 surpassed the more stringent Bonferroni correction including MHC class II receptor activity (nominal P=1.2E-6), cytokine-mediated signaling pathway (P=6.9E-5), response to interferon-gamma (P=1.5e-4), regulation of protein sumoylation (P=4.4E-5), endoderm formation (P=5.9E-5), epigenetic regulation of gene expression (P=6.7E-5), and fatty acid transmembrane transport (P=9.5E-5). Similarly, we evaluated sets of CpGs underlying DNAm-based surrogate biomarkers. At FDR < 0.05, we found n=388, 307, and 153 significant gene sets for DNAm B2M, PAI-1, and Cystatin-C, respectively. Of those, the top gene sets are involved in immune function (nominal P=1.1E-9 for DNAm B2M CpGs), adipocytokine signaling pathway (P =3.6E-7 for DNAm PAI-1 CpGs) or lipid function (P =3.8E-7 for DNAm PAI-1 CpGs). The significant gene sets for all DNAm surrogate biomarkers can be found in Supplementary Data 3.

Diet, education, and life style factors Several previous measures of epigenetic age acceleration in blood have been shown to exhibit statistically significant but weak correlations with lifestyle factors and biomarkers of metabolic syndrome [22,33]. Here we revisited these cross-sectional studies in the WHI (comprising approximately 4000 postmenopausal women, Methods) with our novel measures of AgeAccelGrim and its underlying DNAm-based surrogate biomarkers (Fig. 6). Figure 6. Cross sectional correlations between DNAm biomarkers and lifestyle factors. Robust correlation coefficients (biweight midcorrelation [62]) between 1) AgeAccelGrim and its eight age-adjusted underlying DNAm-based surrogate biomarkers and 2) 38 variables including self-reported diet, 9 dietary biomarkers, 12 variables related to metabolic traits and central adiposity, and 5 life style factors. The 2-color scale (blue to red) color-codes bicor correlation coefficients in the range [-1, 1]. The green color scale (light to dark) applied to unadjusted P values. The analysis was performed on the WHI cohort in up to 4200 postmenopausal women. An analogous analysis stratified by race/ethnicity can be found in Supplementary Fig. 30.

All (age-adjusted) DNAm-based biomarkers correlate with plasma biomarkers measuring vegetable consumption, but AgeAccelGrim (robust correlation coefficient r=-0.26, P=9E-39,Fig. 6) and DNAm PAI-1 (r=-0.25, P=7E-36) stand out in terms of their strong relationship with mean carotenoid levels (Fig. 6, Supplementary Fig. 30). Far less significant associations could be observed for self-reported measures of fruit, vegetable, and dairy intake, which highlights the limitations of self-reported measures of dietary intake. The following novel results could not be observed with previous DNAm-based biomarkers of aging: (self-reported) proportion of carbohydrate consumption was associated with lower AgeAccelGrim (robust correlation r=-0.12, P=4E-13) and DNAm PAI-1 (r=-0.15, P=3E-20). Conversely, an increased proportion of fat intake (but not protein intake) was associated with increased AgeAccelGrim (r=0.09, P=2E-8) and DNAm PAI-1 (r=0.13, P=1E-14). Measures of lipid metabolism, triglyceride levels and HDL cholesterol levels, were significantly correlated with AgeAccelGrim (r=0.11 and r=-0.10, respectively) and even more so with (age adjusted) DNAm PAI-1 levels (r=0.34 and r=-0.11). Similarly, measures of glucose metabolism, insulin- and glucose levels, exhibited positive correlations with AgeAccelGrim (r=0.16 and r=0.12, respectively) and with (age adjusted) DNAm PAI-1 levels (r=0.30 and r=0.22). Similar to what we observed with previous DNAm based biomarkers of aging, plasma C-reactive protein levels exhibited comparatively strong positive correlations with DNAm-based biomarkers, particularly AgeAccelGrim (r=0.28, P=2E-52), DNAm TIMP-1 (r=0.27, P=2E-49), and DNAm PAI-1 (r=0.26, P=1E-46). Measures of adiposity, BMI and waist-to-hip ratio, are associated with increased AgeAccelGrim, age-adjusted DNAm PAI-1, and other DNAm-based surrogate biomarkers. Higher education and income are associated with lower AgeAccelGrim (P=2E-9 and P=2E-6). AgeAccelGrim stands out when it comes to detecting a beneficial effect of physical exercise (r=-0.10, P=3E-10). Several of our results in the WHI could be replicated in a smaller dataset (N< 625 individuals from the FHS test data) that included lipid and metabolic biomarker data (Supplementary Fig. 31). In the FHS, hemoglobin A1C and albumin levels (in urine) exhibited significant positive correlations with AgeAccelGrim, age-adjusted DNAm PAI-1 (0.10 ≤ r ≤0.12 and 1.4E-7 ≤ P ≤ 2.3E-3), and to a lesser extent with our other DNAm based surrogate biomarkers (Supplementary Fig. 31).

Omega-3 polyunsaturated fatty acid supplementation Omega-3 polyunsaturated fatty acid (PUFAs) supplementation is increasingly used for protection against cardiovascular disease. However, omega-3 PUFA supplementation was not found to be associated with a lower risk of cardiac death, sudden death, myocardial infarction, stroke, or all-cause mortality [34–36]. We studied the association between self-reported omega-3 intake and AgeAccelGrim in n=2,174 participants of the FHS and found that omega-3 acids intake was negatively correlated with AgeAccelGrim (robust correlation r=-0.10, P=4.6E-7, linear mixed effects P=1.3E-5, Supplementary Table 11). The effect of omega 3 supplementation is more pronounced in males (r=-0.08, P=0.012) than in females (r=-0.05, P=0.07). A multivariate linear mixed model analysis revealed an association between AgeAgelGrim and omega-3 acid levels (linear mixed effects P=0.017) after adjusting for gender, educational levels, data status (an indicator of training data), and smoking pack-year.