DNA methylation changes in mice correlating with age

In order to study age-associated DNA-methylation changes in the mouse over a wide range of ages and tissues, we collected liver, lung, heart and brain (cortex) samples from newborn to 41-week-old mice (Fig. 1a). To reduce genetic variability and hormonal variations, we restricted our cohort to male C57BL/6-BABR mice and sampled three to five animals per time point. In total, we collected 62 samples (Additional file 1) and extracted genomic DNA for methylation analysis from them.

Fig. 1 DNA methylation changes correlate with age. a Overview of the Babraham dataset. Ages of the mice are emphasised by colour. Tissues (liver, lung, heart and cortex) were isolated from mice at four distinct time points (newborn, 14 weeks, 27 weeks and 41 weeks). DNA was isolated from these tissues and reduced-representation bisulfite (RRBS) libraries were made. b Heatmap of the top 500 tissue independent age-associated correlations. Highlighted are ages and tissues, CG sites were clustered by Euclidean distance. c Single CG sites within the genome are correlated with age. Shown is an example site (chr8:120397660), with a Spearman correlation with age of 0.65. Tissues are highlighted by colour. Jitter is for aesthetic purposes only. d Overview of Spearman correlations calculated over all tissues. The distributions of correlation estimates are shown in the histogram and proportionate numbers of correlations are shown in the bar plot. Nominal correlations are highlighted (p value < 0.05) in light orange (positive) and light blue (negative). Significant correlations are highlighted (q-value < 0.05) in orange (positive) and blue (negative). Numbers provided above the bar plot represent the number of single CG sites within the given category. e Surrounding CpG content (± 500 bp) of the significantly correlated sites. Background was calculated from all CG sites with five-fold coverage in 90% of samples; CpG scarcity is defined as the average CpG distance within this 1 kb region. *Bonferroni corrected p value < 0.05. f Enrichment of a correlated CG site falling within a given genomic element. Background was all CG sites with five-fold coverage in 90% of samples. Tested using binomial test; *p value < 0.05. g Gene Ontology (GO) analysis of the significantly correlated CG sites. GO terms are plotted against –log(corrected p value). Positive correlations are shown in orange and negative in blue. The six most significant GO terms are shown. h Tissue-specific Spearman correlations with age. An example is provided of a tissue-specific correlation with age in cortex, liver, lung and heart. Correlations for these CG sites are provided for all tissues combined and for the tissue in question. Jitter is for aesthetic purposes only Full size image

We generated reduced-representation bisulfite sequencing (RRBS) libraries of all samples to be able to assess DNA methylation changes at a wide range of CG sites and sequenced these to 15× genomic coverage on average. RRBS represents a good compromise between sequencing costs, CG sites measured and fold genomic coverage obtained. To improve the quantification results, we optimised the standard RRBS library preparation protocol [26] and were able to achieve very low duplication rates and high genomic coverage. On average, more than 1.23 million CG sites in each sample were covered at least five-fold and of these 0.73 million CG sites had more than five-fold coverage in all samples analysed. Global CpG methylation levels in newborns were around 43% and did not differ significantly between tissues. In the older samples, average methylation levels were slightly higher (~45%) but did not show major differences between ages or tissues (Additional file 2A). Global methylation levels measured by RRBS are generally lower than whole-genome bisulfite sequencing (WGBS) estimates, as the method enriches for hypomethylated CpG islands (CGIs) [26]. We also observed low levels of non-CpG methylation in all non-brain tissues (Additional file 2B). In agreement with the notion that de novo methylation activity in non-dividing cells results in accumulation of CHH methylation, we found that adult cortex samples had higher CHH methylation levels than newborn cortex samples [27]. Together our samples represent the most comprehensive dataset thus far of matched single base resolution methylomes in mice across multiple tissues and ages. Importantly, a hierarchical clustering analysis using Manhattan distances (Additional file 2C) clearly separated the samples by tissue (with the exception of newborn lung samples which clustered together with adult heart samples), highlighting key tissue-specific methylation signatures [28].

A correlation analysis showed that DNA methylation at a substantial number of CG sites across all tissues correlated with age (Spearman’s correlation, with a multiple testing corrected p value < 0.05) (Fig. 1b and c). As expected, the majority of these sites showed age-correlated methylation changes in all or at least three tissues (Additional file 2D), suggesting that age-dependent DNA methylation changes at specific sites occur in a coordinated manner across tissues. The correlation values between age and DNA methylation at discrete CG sites were both positive and negative and normally distributed (Fig. 1d). Overall, we identified more positive correlations, but this skewing is likely to be the result of the slight global hypomethylation in the newborn samples and represents the general tendency for gain of DNA methylation during development.

To understand whether the underlying sequence composition or genomic context was relevant to the changes observed, we analysed CpG density (Bonferroni corrected two-tailed t-test, p value < 0.05) and genomic context (Binomial test, p value < 0.05) at the significantly correlating sites. While CpGs changing DNA methylation with age were on average in regions with higher CpG density (lower CpG scarcity), the CpG density was not predictive of the methylation changes (AUC = 0.58 or 0.61; see ‘Methods’ for details) (Fig. 1e). We also found a strong depletion of sites with significant correlations with age at CGIs and CGI rich promoters and conversely a strong increase at CGI-shores and CGI-shelves (4 kb around CGIs) (Fig. 1f). This indicates that tightly controlled regulatory regions, such as CGIs, are relatively protected from age-associated DNA methylation changes, while regions with intermediate CpG density are more prone to changes. A Gene Ontology (GO) analysis of the genes closest (max. 4 kb distance) to CpGs with highly significant positive age-dependent changes in DNA methylation revealed a significant enrichment of genes associated with the terms ‘anatomical structure morphogenesis’, ‘anatomical structure development’ and ‘developmental process’ (Fig. 1g). Genes close to sites with significant negative age-related correlations were enriched in terms containing nucleotide and enzyme binding. These results suggest that the age-related changes in DNA methylation could alter various important biological processes and future experiments may reveal the regulatory relevance and association with gene expression. In particular, increased DNA methylation at developmentally relevant genes suggests that the ageing process may restrict expression of developmental genes.

We next analysed age correlations in each tissue independently and identified a large number of sites, which showed exclusively tissue specific changes in DNA methylation with age (Fig. 1h and Additional file 2E). Interestingly, only a small fraction of these were shared between three or more tissues, indicating that additional age-related DNA methylation changes are characteristic for each tissue and might relate to its intrinsic biological function (Additional file 2F). In particular, GO analysis of genes close to the sites with tissue-specific changes revealed highly different GO term enrichments, indicating their unique tissue-specific regulation.

The marked differences between the global methylation levels of newborn and adult samples (Additional file 2A) prompted us to also analyse the datasets excluding newborn samples (Additional file 3). A number of sites showed a reversed directionality of age-dependent methylation changes if we excluded the newborn samples from the analysis (highlighted in pink and green; Additional file 3A and B); however, the majority of sites (highlighted in blue and orange/red) showed the same correlation characteristics independent of the inclusion or exclusion of newborn samples (Additional file 3C). The correlations between age and DNA methylation in the adult datasets were both positive and negative and the correlation values were normally distributed (Additional file 3D), but none of the correlations were significant (Additional file 3E; Spearman’s correlation, with a multiple testing corrected p value cutoff of < 0.05). Although the analysis was based on all tissues, only a small number of the age-correlated methylation changes in the adult samples were common to all tissues (p < 0.005; Additional file 3F). A subsequent analysis of age correlations in each tissue independently identified a large number of tissue specific methylation changes (p < 0.005; Additional file 3G and H), suggesting that changes in DNA methylation in adults are primarily driven by tissue-specific processes, while tissue-independent methylation changes are associated with global developmental processes (Fig. 1g).

DNA methylation levels at a discrete set of CpGs are predictive of age

Having found that methylation at many individual CG sites did change in an age-dependent manner, we decided to generate an epigenetic age predictor in mice (Fig. 2a). In addition to our own datasets, we also included previously published datasets comprising RRBS libraries from liver, lung, muscle, spleen and cerebellum samples from male and female C57BL/6 mice aged newborn to 31 weeks [29–32]. An overview of these datasets can be found in Additional file 4. In short, 129 healthy samples were used to define the training set, with the remaining 189 making up an independent test set, including two datasets that were generated by different labs and hence experimentally independent from the datasets used to train the model. All samples were processed as described in the ‘Methods’. We excluded CpGs located on either of the sex chromosomes or in the mitochondrial genome prior to further processing and analysis.

Fig. 2 Prediction of chronological age from a mouse epigenetic clock. a Flow chart to illustrate the steps taken in defining the model and testing it. Datasets are displayed as segments of a circle. They are coloured to correspond with later figures, namely: Reizel in brown (R), Cannon in green (C), Babraham in purple (B), Zhang in pink (Z) and Schillebeeckx in light green (S). The two independent datasets are displayed as segments in a separate circle to those datasets utilised for the training phase. The flow of methylation data is shown as colour-coded lines. Training occurs at the node (screen) with the caption: ‘glmnet’. The chosen CG sites and their corresponding weighting are passed on to the prediction tool itself (node with the caption: ‘epigenetic age predictor’. Test data enter this prediction tool and age predictions are outputted, as displayed by the pocket watches exiting this node. b Scatter plot depicting weight of chosen clock sites against their age-associated correlation, blue are negatively correlated sites and orange are positively correlated sites. c Training set ages as predicted by the model, x-axis shows the actual age and y-axis shows the predicted age, coloured by tissue. Jitter is used to represent the experimental error in age estimates. d Test set ages as predicted by the model, x-axis shows the actual age and y-axis shows the predicted age, coloured by tissue. e Box plot of the absolute error in the training (left) and test (right) samples. The median absolute error is indicated. f Principal component representation of the sites used in the clock, coloured by age in weeks Full size image

We used an elastic-net regression model to predict log-transformed chronological age, measured in days. Only CG sites covered by at least 5 reads in all samples were used (~18 k sites; see ‘Methods’ for details). Following cross-validation to optimise the model parameters, the final predictor was based on 329 CG sites (Additional files 5A and 6); the sites in this predictor will be referred to as mouse (multi-tissue) clock sites. The model selects the most informative sites but allows for some redundancy to increase robustness and it infers weights for each individual site [33]. The model weights across sites are depicted in Additional file 5B. One implication of this approach is that the clock sites do not necessarily represent the strongest age-correlating sites characterised above (Fig. 2b). Similar to the human age predictor described by Hannum et al. [5], the initial starting methylation levels of the mouse clock sites are somewhat predictive of the directionality of their methylation changes with age (Additional file 5C).

As expected, the exponent of the weighted average of the DNA methylation levels of the 329 selected CG sites was highly correlated with (chronological) age of the individual samples within the training dataset (Fig. 2c). Notably, using unobserved test samples, our mouse epigenetic clock was able to accurately predict chronological age in various tissues (Fig. 2d) and across multiple independent datasets (Additional file 7A). The accuracy of the model predictions was also independent of sequencing depth of the test samples (Additional file 7B), provided there was a mean coverage per CG site of 5 reads or more. This indicates that minor technical variations and coverage differences are well tolerated and consequently our mouse epigenetic clock model can be applied to a wide range of different settings. This was evident by the fact that the predictor was able to accurately estimate age in two completely independent test datasets [31, 32] (Additional file 7C).

The mouse clock performed well across all tissues and ages tested, with an age correlation of 0.839 and median absolute error (MAE) of 3.33 weeks in the test data (Fig. 2d and e), corresponding to less than 8.5% error relative to the oldest ages (41 weeks) analysed. In order to compare the accuracy with the human epigenetic clock [6], we calculated the MAE as a proportion of the expected lifespan of a mouse (>100 weeks) and found it to be similar to that reported for the human clock (assuming an average human lifespan of 85 years). It is worthy of note that, similar to the human clock, the performance of our mouse age-predictor varied between young and old mice. In young animals (< 20 weeks), the model predictions were much more accurate, with a MAE of 2.14 weeks in the test samples. In mice aged 20 weeks or older, the MAE was 4.66 weeks (Additional file 7D). We also attempted to include publicly available WGBS datasets, including samples from 24-month-old animals [34], to test our age predictor at older ages. Unfortunately, the sequencing depth in most WGBS datasets is significantly lower than in RRBS datasets (mean coverage < 5-fold), thus precluding their analysis. It is expected that future high coverage datasets will help to further improve the accuracy of the mouse clock and test its performance at older ages.

To get further insights into the architecture of our multi-tissue age predictor, we performed a principal component analysis (PCA) of the variation within the 329 selected sites in the training datasets. Ninety percent of the observed variability was explained by 69 principal components (PCs) (Additional file 7E), of which two PCs (PC1 and PC13) displayed a clear age relation (p < 0.05). PC1 captured age-dependent changes and showed a good separation of samples by age; PC2 separated liver samples from the other tissue samples (Fig. 2f and Additional file 7F). This analysis highlights that the major variation within the selected CG sites in the training set is governed by numerous factors, including tissue type and age. However, dataset effects (i.e. technical variations) are not among the major drivers of variation.

Next, we characterised the clock sites in more detail. The CG sites were distributed across all autosomes with no specific enrichment in any chromosome (Additional file 8A). Similar to the age-correlated CG sites, we found a strong depletion over CGIs and CGI promoters but also over non-CGI promoters (Additional file 8B), suggesting that the clock sites were specifically depleted in regulatory regions. CGI shores and intergenic regions showed increased enrichment in clock sites, whereas the CpG density around the clock sites did not show any differences compared to other random CG sites (Additional file 8C). The 329 clock sites did also not show any specific GO enrichment (not shown), suggesting that the sites selected by the model might not represent a unique biological function and are instead associated with various biological functions.

Age predictions using the human clock sites in mouse

Given the similarities between our mouse epigenetic age predictor and the previously described human epigenetic clock [6], we asked whether the specific genomic loci described in human could be used to predict age in mouse samples too. First, we attempted to lift over the genomic locations of the 353 human clock sites to the mouse, by defining regions ± 500 bp around the sites. We were able to lift over 328 regions, of which 175 regions (in the following referred to as ‘Horvath clock regions in mouse’) were covered by at least one CG site in our dataset. Methylation levels at these 175 Horvath clock regions in mouse were only weakly correlated with age (Additional file 9A), which is also true for the Horvath clock sites in human [6].

Next, we assessed whether the 175 Horvath clock regions in mouse could be used for age prediction. Using a ridge model, we were able to generate an age prediction model with a MAE of 11.2 weeks (Fig. 3), indicating that the methylation levels at the 175 Horvath clock regions in mouse contain age-related information and are predictive of age in the mouse. The directionality of the mouse-specific and human-specific weighting for any individual CG site within the Horvath clock regions in mouse were completely unrelated (Additional file 9B), suggesting that the human clock [6] is not fully conserved in mouse. Remarkably, we found that we could also generate age prediction models from matched random regions (see ‘Methods’ for details) with an average MAE of 10.6 weeks (Fig. 3), highlighting the fact that methylation changes in many genomic regions are age-related and potentially predictive.

Fig. 3 Prediction evaluation of the human Horvath clock sites in mouse. Shown is the MAE of the age prediction model generated using the lifted-over human clock sites [6] (red line). The distribution (blue) shows the MAEs of 1000 age-prediction models generated using random selections of 329 regions (see ‘Methods’ for further details) Full size image

In summary, the predictive accuracy of the sites corresponding to the human clock in mouse samples was significantly lower than our mouse epigenetic clock. These differences are in part due to technical differences in defining the clock sites, but do also highlight species-specific differences, which result in different age-dependent methylation changes at discrete loci and might point to different ‘ticking’ rates of human and mouse ageing.

DNA methylation age is altered in ovariectomised females and by diet

Given the accuracy of our mouse epigenetic clock to predict chronological age in healthy individuals, we asked whether epigenetic age in the mouse would be affected by gender, diet or other biological interventions. Since it has been reported that DNA methylation shows gender-specific patterns [30], we compared the predicted DNA methylation age of female and male mice. Although the training data were predominantly composed of male samples (> 72%), we did not find significant differences in the estimated ages between sexes (Fig. 4a and Additional file 10), highlighting the robustness of the model and showing that gender-specific differences are not skewing the predictions of the mouse epigenetic clock.

Fig. 4 Methylation age is affected by biological interventions. a Predicted age of 20-week-old liver samples [30], shown separately for males and females. Statistical test performed: t-test, p value of 0.58. b Age prediction in test samples from various biological intervention studies. X-axis shows the actual age and y-axis shows the predicted age, coloured by study. c Age prediction in normal female liver samples and samples which underwent an ovariectomy and were administered with either vehicle alone or testosterone [30]. Statistical test performed: Unpaired two-tailed t-test performed to assess the impact of ovariectomy, p value of 0.014. d Age prediction in diet perturbation study [29]. Liver samples from animals with following treatments were analysed: maternal high fat diet followed by adult high fat diet, maternal high fat diet followed by adult low fat diet, maternal low fat diet followed by adult high fat diet, and maternal low fat diet followed by adult low fat diet. Statistical test performed: two-way ANOVA performed, p values displayed where significant Full size image

We next analysed publicly available RRBS datasets of samples with various biological interventions [29, 30]. Overall the epigenetic age predictions were well related to the true chronological age of the individual mice (Fig. 4b), but at closer inspection we identified important differences, which depended on treatment.

Reizel et al. performed ovariectomy in females, treated the mice with testosterone (5 mg/mL) or vehicle [30] and analysed their liver DNA methylome by RRBS. Ovariectomy decreases average lifespan in female rats [35] and indeed we found that ovariectomy increased epigenetic age significantly in mice (Fig. 4c). Additional testosterone or vehicle treatment had no significant effect on the epigenetic age and was also shown to not alter lifespan in rats [35]. This suggests that hormonal differences in mice affect ‘biological age’, as measured by our mouse epigenetic clock; similarly in humans, breast tissue is found to have accelerated epigenetic ageing [6].

In a second study, Cannon et al. [29] characterised the effects of lipid content in maternal and offspring diet on body weight, physiology and DNA methylation status in the liver and reported a significant effect of maternal diet on the susceptibility of the offspring to become obese and develop signs of metabolic disease. The greatest adverse effects were observed when the mothers were fed a low-fat diet and the offspring a high-fat diet. By applying our mouse epigenetic clock to these liver samples, we identified a strong dependency of the predicted epigenetic age on diet (Fig. 4d). Offspring on a high-fat diet showed accelerated epigenetic ageing, which had a tendency to be further exacerbated if the mothers were fed a low-fat diet. This result suggests that biological ageing is modulated by diet and possibly intergenerational or transgenerational effects and that epigenetic age is a potential powerful measure of biological function.