Integration of epigenetics into large and diverse longitudinal population studies

Current knowledge

Longitudinal studies following individuals over the course of their lifetime have considerable advantages in evaluating causal risk factors in disease development. For DNA methylation clocks, these studies are also extremely valuable, as cross-sectional data cannot assess the dynamics of the clock-related changes and measurements over time within an individual. Thus, these analyses can evaluate the relative contributions to epigenetic clock variation, including consistent differences from the start of life, altered trajectories at particular life junctures, such as puberty, or gradual divergence over the entire life-course [9]. Furthermore, the predictive power of clocks for age-related disease can be directly assessed.

The vast majority of epigenetic clock studies to date have been conducted in adults and are cross-sectional in design. The few initial longitudinal analyses performed have seen little variation over epigenetic age acceleration assessment within the same decade [49], and within middle age, multiple clocks track closely [33]. One substantial meta-analysis of longitudinal data from Marioni et al. in five cohorts, comprising 4075 adult participants, identified a slower rate of increase of epigenetic age compared to chronological age with time, with both the Horvath and Hannum et al. clocks [101]. Also, there is a non-linear (logarithmic) pattern in the clock during teenage years [24, 79]. Therefore, the clock calculation by Horvath included a log-linear transformation for data points from younger individuals. When applied to longitudinal datasets, both the Horvath and Hannum et al. clocks show signs of an asymptote in later life, where chronological age increases at a faster rate than epigenetic estimated age (see Fig. 1c). Cross-sectional studies have also consistently shown strong biological sex differences, with men having greater positive age acceleration than women [102].

Current uncertainty

The non-linear rate of clock ticking and what may influence this is not precisely defined. The Horvath clock is seen to run the fastest during development, while during adulthood, linear associations are observed with clock years increasing at the same rate as chronological years, on average. The biological aging marker of epigenetic age acceleration assessed from birth shows minimal variation to adolescence and then increases with age [103] and is hypothesized to be influenced by developmental changes during childhood and adolescence [104].

The full extent of genetic influence on DNA methylation both within CpGs on the arrays and further beyond in the genome is still underappreciated [86, 105,106,107,108]. How significant and through which pathways genetic influences act on clock longitudinal dynamics is uncertain, but has begun to be explored [109], and further major meta-analyses are in progress. Twin studies estimate that the heritability of the epigenetic age acceleration is relatively high (h2 ~ 40%) [9]. This is even higher at younger age, implying, as we age, there is an increasingly environmental contribution to the age acceleration calculation [24]. Of note, a genome-wide association study for the Horvath clock calculated age acceleration identified five loci, including an intronic variant with unknown functional implications within the telomerase reverse transcriptase (TERT) gene [110]. It is still unclear how much deviation of epigenetic age from chronological age is driven by different rates in biological aging or genetically determined differences between individuals. Moreover, various threads of evidence indicate some epigenetic loci display increased variability with age, which may potentially be an important and distinct measure in capturing biological age [111]. This is also observed in longitudinal analysis, with a fraction of these age-varying CpGs identified to be under genetic influence [109, 112].

Further areas of uncertainty arose from the longitudinal meta-analysis of Marioni et al. [101]. Firstly, significant differences between the Horvath and Hannum et al. clocks were seen, as would be expected due to their differing tissue training sets. However, they further proposed that while some of the slowing of the clock rate in the elderly may be due to survivor bias, there may also be a plateau to epigenetic clock estimates. Intriguingly, a possible decline at late age has even been postulated [113].

Recently, Zhang et al. identified that correcting for blood cell type proportions attenuated the all-cause mortality associations with both the Horvath and Hannum et al. clocks [46]. This reduction was shown to be greater for clocks built from smaller training sets. Furthermore, the association with mortality lessened, even without cell type correction, with increased training set size. The biomarker power of specific clocks may be increased or decreased depending on the contribution of major cell proportions to the specific disease or trait being examined (see Table 1). Changes associated with immunological aging [114, 115] are clearly contributing to aspects of biological aging. However, more precision is required regarding how these manifest within individuals over the life-course, as well as which specific cell types drive distinct associations.

Future experiments and recommendations

Longitudinal studies enable the description of the phenotypic manifestations of aging within individuals [9]. Therefore, they are powerful for determining the predictive ability of the DNA methylation biomarkers of disease and outcomes in individuals. As these studies are generally designed with multiple, often frequent, biospecimen collection, those including early age and young adulthood will be able to query observed departures of predictors from chronologic age through this developmental period. Similarly, samples obtained over multiple timepoints from elderly subjects could address questions about slowing in epigenetically predicted age. The availability of multiple sources of DNA from various tissues over time would also facilitate robust multi-tissue age evaluations [9].

By identifying the best-designed studies, with appropriate tissues, physiological, functional, and molecular biomarkers, and disease monitoring, the relative disease predictive power of DNA methylation can be robustly assessed. Due to the expense, consensus on this investment will help its realization. There has been significant success in genetic studies using the rigorously phenotyped UK Biobank. This is not only through extremely powerful GWAS, but also collating this information into a calculated risk for specific common diseases with genome-wide polygenic risk scores (PRSs) [116] with potential clinical utility [117]. Many well-known cohorts have generated DNA methylation data [107, 118, 119], but, undoubtedly, it would be highly desirable to assay further powerful longitudinal studies in extremely large datasets of deeply phenotyped individuals. Understanding the dynamics of clock-estimated age will improve as more studies obtain repeated measures of DNA methylation. This could include an application of latent class analysis on categories such as early, late, or constant epigenetic age acceleration.

It would be beneficial to generate DNA methylation data at scale on one or more cohort studies that have (a) prospectively collected data and DNA samples, (b) deep phenotyping of age-related traits, (c) standard biochemical markers of aging-related decline, (d) repeated measures, and (e) genetic data. Given the derivation of human DNA methylation clocks from array-based data, the latest generation DNA methylation array (EPIC 850k) would be the pragmatic approach at the current time. However, the field is currently in transition between a reliance of array-based platforms that capture data on a small subset of CpG sites and sequence-based approaches. As noted later (in “Challenge 4”), the interrogation of a wider range of DNA methylation sites using sequence data will ultimately bring added insights into underlying mechanisms, but the cost of such an approach at scale and appropriate depth is currently prohibitive.

The interrelationship between genotype and DNA methylome clock changes could be robustly evaluated in any large epidemiological cohorts that are genotyped (for some, such as UK Biobank, a significant portion is soon to be fully sequenced). Therefore, chronological age estimation could potentially be improved after correcting for identified genetic effectors on this measure. More nuanced haplotypic integration of epigenetic and genetic variation will ultimately be required. It will also be possible to study the impact of how genetic variation can influence clocks driven by relevant causative factors, such as inflammation and immunological aging. The relationship between genotype and DNA methylation clock calculations can be exploited to gain insights into causal or mechanistic pathways. For example, in cohorts where both genotype and DNA methylation data are available, it would be feasible to apply a Mendelian randomization approach to appraise the causal impact of a potential determinant of clock-derived age [120, 121]. A hypothesis-free approach might include the application of LD score regression [122], which would use all genetic variants associated with clock age and compare these against all available GWAS data to search for traits that show common genetic architecture with DNA methylation clock age. This may shed light on potential pathways that influence aging.

There is considerable potential clinical utility in the incorporation of epigenetic data in disease prediction. Given the precision with which DNA methylation clock age can be estimated and evolving measures of biological, phenotype-, and disease-related age (e.g., PhenoAge [43], GrimAge [45]), it may be a useful tool in enhancing clinical prediction models of age-related disease incidence. Studies to date have assessed the combined contribution of genetic and epigenetic data to specific traits [123, 124] and have demonstrated the utility of using DNA methylation as an index of specific health-related exposures, notably smoking [125], to predict future disease risk [126]. This ability to use blood-derived DNA methylation as a systemic exposure measure will continue to be refined. Adding clock-derived measures of biological aging to such prediction models could bring enhanced sensitivity and specificity over and above that possible from self-reported measures of known risk factors. For example, cardiovascular risk could combine genetic PRS for this trait with GrimAge clock measures, which estimate cardiovascular disease-related risk, such as smoking pack-years, plasma beta-2 microglobulin, and other plasma proteins, and predicts time to coronary heart disease [45].

Regarding the issue of cell type deconvolution for clock association, this will be specific to the disease or trait being examined. Single-cell analysis, as detailed in “Challenge 5,” will also help pinpoint which cell type(s) is the most important and guide the use of cell type corrections in heterogeneous DNA samples for larger longitudinal and epidemiological studies.

Another very important issue is that all these genetic and epigenetic data and analyses are strongly biased toward populations of European ancestry and other populations are grossly under-represented. Further large-scale diverse longitudinal studies are imperative [127]. As mentioned, the extent of genetic influence is currently underestimated and will therefore need detailed analysis across multiple populations (see Table 1). Additionally, the unique advantages of monozygotic twin studies should also be borne in mind [105, 128], and the comparison of these non-genetically confounded studies with larger population findings may be illuminating. Another fascinating avenue to explore that may reveal novel insights are those contemporary populations worldwide that commonly exhibit extreme longevity, termed “blue zones” [129]. These regions include Nicoya in Costa Rica, Ikaria in Greece, a region of Sardinia in Italy, Okinawa in Japan, and Loma Linda in the USA [2].