DNAm age is an epigenetic marker of biological aging that reflects age-related cumulative changes in DNA methylation influenced by both environmental and genetic risk factors. The Horvath method has been widely used to estimate DNAm age based on methylation levels of 353 clock CpGs measured from earlier array technologies [6]. Compared to recent data from sequencing technologies, these array data have a limited coverage and resolution on genome-wide DNA methylation. Thus, the increased availability of sequencing data may provide an unprecedented opportunity to refine the Horvath clock model by considering more CpGs in training and selecting optimal clock CpGs for DNAm age estimation. On the other hand, the Horvath clock model is poorly calibrated in breast tissue with a high error rate and breast tissue appearing older than other parts of the body [6]. Therefore, it is reasonable to single out on breast tissue and develop a breast tissue-specific model to more accurately estimate DNAm age in breast tissue. To the best of our knowledge, this is the first study that develops a breast tissue-specific model for DNAm age estimation using DNAm sequencing data. We have shown that the tissue-specific model developed in the study not only has a higher accuracy in DNAm age estimation for normal breast tissue, but also is applicable to other breast tissue types and yields biologically meaningful results.

Our breast tissue-specific model selected 286 clock CpGs for DNAm age estimation, the number of which is about 20% less than the 353 clock CpGs in the Horvath clock model. Despite the smaller number of clock CpGs, our model has a significantly improved predictive accuracy in breast tissue (r = 0.88, median error = 4.2 years), compared to the Horvath clock model (r = 0.73, median error = 8.9 years). This drastic improvement in accuracy is likely a result of the optimal selection of clock CpGs in our model that better capture tissue-specific changes. The Horvath clock model is likely less sensitive to tissue-specific changes, given that its reference training data is across multiple tissue types. This could lead to an “averaging out” of tissue-specific changes in order to achieve a pan-tissue epigenetic clock. Further, the Horvath clock model is not well calibrated in female breast tissue, which will add additional noise to studies involving this tissue type. Since our model targets breast tissue specifically, it reduces both the averaging-out and induced noise limitations of the Horvath clock model in breast tissue. To ensure our model selected the most relevant set of CpGs, we conducted a sensitivity analysis where the model was retrained on two sets of CpGs that underwent “loose” and “strict” levels of quality control (QC). For the strict-QC model, CpGs were restricted to have <1% missingness across samples to reduce the dependence on imputation. A total of 1.2 million CpGs remained after QC. Model training selected 247 clock CpGs, of which 122 overlapped with the current model. The strict model had a slight reduction in predictive accuracy (r = 0.87, median error = 4.5 years). For the loose-QC model, CpGs had no restrictions on missingness, and all missing values were simply imputed. A total of 3.3 million CpG probes remained after QC. Model training selected 280 clock CpGs, of which 191 overlap with the current model. This loose model had a slight increase in predictive accuracy (r = 0.89, median error = 3.5 years), which might be attributed to overfitting as a result of a large increase in the number of imputed CpGs. We applied both the strict-QC and loose-QC models to all other analyses in the study and found essentially identical results. The consistency of results between the three model instances supports the robustness of the model developed in our study.

Although only one clock CpG directly overlaps between the 286 and the 353 clock CpGs in our model and the Horvath clock model, the DNAm age estimates from both models were still in good concordance (r = 0.79), suggesting both sets of clock CpGs might capture biological pathway or functions related to general aging processes. Indeed, the Ingenuity Pathway Analysis of annotated genes in both models suggests enrichment of functions including cell death and survival, cellular growth and proliferation, tissue development, and cancer; however, the annotated genes in our model may be more related to tissue-specific aging through multiple stages of cell life. Of particular interest, the top clock CpGs positively associated with age in our model and their annotated genes are enriched in EGF and estrogen-receptor signaling. It is hypothesized that hormone cycling during menstruating years is responsible for the observed accelerated aging in female breast tissue with respect to the rest of the body [6, 21, 22]. Further, EGF overexpression is observed in all subtypes of breast cancer and has been shown to be associated with larger tumor size, poor differentiation, and poor outcomes [27,28,29]. Top clock CpGs negatively associated with age and their annotated genes are enriched in ATM and apoptosis signaling, both of which are linked to genome integrity and aging [30] as well as sustaining breast-cancer tumorigenicity [31] and tumor morphology [32]. The correlation of these cancer-related CpGs with DNAm age provides a mechanistic link through which aging contributes to cancer development.

The DNA methylation patterns of our 286 clock CpGs across all samples and tissue types showed a clear separation of tumor breast tissue from normal or adjacent-normal breast tissue. Differentially methylated clock CpGs between tumor and normal or adjacent-normal breast tissue are enriched in cancer-related pathways. Hypermethylated clock CpGs in tumors were related to NAD biosynthesis, which declines during the aging process [33]. It has been proposed that this decline triggers the interaction between DCB1 and PARP1, which decreases the frequency of DNA damage repair [33]. These hypermethylated clock CpGs suggest potential biological functions involved in acceleration of the aging process in breast tumor tissue relative to normal and adjacent-normal breast tissue, which is consistent with our other observations that breast tumor tissue has a positive acceleration of DNAm age.

When considering all 286 clock CpGs jointly to estimate DNAm age in our model, we found the DNAm age could easily distinguish breast tumor tissue from normal and adjacent-normal tissue. As chronological age was increased, DNAm age in breast tumor tissue increased at a much higher rate (by ≈ 17%) compared to almost no increase in normal and adjacent-normal breast tissue. On average, breast tumor tissue was about 7 years older in DNAm age than its chronological age, while no significant age acceleration was observed for normal and adjacent-normal breast tissue. The larger inter-sample variation of DNAm age estimates in breast tumor tissue might be consistent with well-known tumor heterogeneity and its underlying complex etiology. In addition, breast tumor tissue was approximately 13 and 12 years older than normal and adjacent-normal breast tissue in DNAm age, respectively, in age-match analysis, also indicating a much higher age acceleration in tumor tissue. These results suggest that DNAm age is not only a marker for aging, but also a promising marker for breast cancer development.

Since our model is not directly comparable to the Horvath model due to differences in the scope of our training data sets (breast tissue-specific vs pan tissue), it raises questions as to whether our model captures aging mechanisms that are unique to or behave differently in breast tissue when compared to those from the Horvath clock model. Indeed, pathway analyses revealed that our model includes some clock CpGs and annotated genes that may be specific to the breast tissue aging process. Furthermore, our model yields a smaller error rate of DNAm age estimation in breast tissue compared to the Horvath clock model. The more accurate estimation of DNAm age would lead to more study power to access epigenetic age acceleration in a given tissue or to compare relative epigenetic age acceleration across different tissue types. Thus, our model is likely to be more relevant and accurate when assessing aging effects on the development of breast diseases such as breast cancer. Indeed, our study has sufficient power to detect moderate differences in DNAm age acceleration between tissue types. Specifically, our study has 80% power to detect a difference in DNAm age acceleration of 1.1, 1.3, and 1.7 years between tumor and normal, between adjacent-normal and normal, and between tumor and adjacent-normal breast tissue, respectively.

Breast cancer is a heterogeneous disease characterized by distinct clinical and pathological features and molecular subtypes [34], reflecting different underlying molecular mechanisms for cancer development. Thus, it is of interest to further explore the performance of DNAm age estimation in these breast tumor subgroups. We found that both HER2+ and HR+/HER2- tumors, which are more responsive to hormone recycling, showed a similar magnitude of positive age acceleration (approximately 10 years). This is in line with the results of our earlier pathway analysis showing enrichment in the EGF and estrogen-receptor signaling pathway for our selected clock CpGs. In contrast, we observed a negligible age acceleration in TNBC, which is an aggressive subtype of the disease with poor outcomes [35]. Recent studies suggest that cancer stem cells are enriched in TNBC and play an important role in tumorigenesis and tumor biology of this subtype [36,37,38]. Cancer stem cells have the unique ability for both self-renewal as well as the ability to re-establish a heterogeneous population of tumor cells (potential to differentiate). Similar to embryonic stem cells, which have been shown to have a DNAm age close to zero [6], we expect that the DNAm age for cancer stem cells to be very young. This, in turn, may “cancel out” the age acceleration in the developed tumor cells in TNBC samples and explain the lack of age acceleration observed. Previous studies using the Horvath clock model found consistent relative epigenetic age acceleration differences for HR+/HER2- and TNBC tumors [6, 39], supporting the validity of our results in this study.

Further inspection of DNAm age with tumor morphology (grade and stage) provided additional insights on the relation of epigenetic age acceleration and the ability of tumor cells to proliferate, differentiate, and potentially metastasize. We observed a non-linear epigenetic age acceleration relationship with Scarff-Bloom-Richardson grade, where epigenetic age acceleration is near zero for grades 3–5, then peaks near grade 6, and then gradually decreases for higher grades. Compared to low-grade tumors, high-grade tumors are undifferentiated, or poorly differentiated, and are more likely to grow and proliferate. Thus, it is conceivable that epigenetic age acceleration in high-grade tumors tends towards younger or smaller values compared to low-grade tumors after certain stage. These speculations are further supported by our epigenetic age acceleration results in relation to tumor stage, where we observed a large, positive age acceleration in early-stage tumors but a negative age acceleration in late-stage/advanced tumors. Similar to high-grade tumors, advanced and late stage tumors are more likely to be poorly differentiated and have a greater potential to metastasize to distal locations. A previous study using the Horvath clock model suggested a similar negative association between stage and age acceleration in thyroid cancer [6]. These observations are in line with the theory that DNAm age is in some way related to the biological processes underlying development, cell differentiation, and the maintenance of cellular identity. Therefore, epigenetic age acceleration may capture both intracellular changes in losing cellular identity and changes in cell composition [40]. While providing interesting insights, these findings were obtained using a relatively small number of case samples and need to be validated in larger studies.

Our study has several strengths. First, we measured genome-wide DNA methylation using next-generation sequencing technology, which provides a much higher coverage and resolution of the human genome compared to previous data from array technology. This allows for more CpGs to be considered when building our model, which ultimately leads towards choosing the optimal set of CpGs that accurately captures age-related DNAm changes. Second, our data is from targeted breast tissue and therefore allows us to capture age-related DNAm changes specific to female breast tissue. The tissue specificity of our model overcomes limitations (e.g., induced noise) set by the poor calibration of breast tissue in the Horvath clock model and allows for more powerful studies. Third, we are able to apply the developed model to different breast tissue types including breast normal, adjacent-normal, and tumor tissue and demonstrate that DNAm age changes across tissue type. Lastly, with our model and data, we are able to examine the relationship between DNAm age and several tumor clinical features, providing insights on epigenetic aging during tumor progression. We also acknowledge the limitations of our study, including small sample size in the tumor subgroup analyses and the limited adjacent-normal breast tissue samples. Larger and independent samples are needed to validate the findings of this study. In addition, since we include only breast tissue in this study, we are unable to evaluate the performance of our breast tissue-specific model in other tissues such as blood. Furthermore, our epigenetic analysis was performed on whole breast issue and did not account for specific cell types in the normal breast. Future studies are needed to assess whether and how DNAm age differs among different cell types of the normal breast.