Differential DNA methylation is associated with breast cancer risk factors in normal breast tissues

Patient demographics and characteristics are presented in Table 1. The study participants ranged in age from 18 to 82 years with a median age of 37. A small proportion of participants were underweight (2%; BMI <18), 40% in the normal BMI range (> = 18 and <25), 30% were overweight (> = 25 and <30), and 28% were obese (>30). Over half of subjects had at least one full-term birth (56%), and the remaining 44% were nulliparous. To test the hypothesis that DNA methylation differences in normal breast tissue are related to known breast cancer risk factors we used the approach outlined in Additional file 1. Using the RefFreeEWAS deconvolution algorithm, we identified the optimal number of putative cell-types as K = 6 as this estimate minimized the deviance of the bootstraps (see “Methods” and Additional file 2A). To investigate whether the heterogeneity in cellular proportions across samples was associated with phenotypic variables (e.g., subject age) we applied a quasi-binomial model for each subject. To avoid dependence on the selection of K (putative cell-types) we examined associations over a range of evaluated K using a permutation test (1000 permutations) for inference of each phenotypic variable. As shown in Additional file 2B, estimated cell mixture proportions were significantly associated with subject age (permutation P value = 2.0E-03), but not subject BMI or parity (Additional file 2B).

Table 1 Subject demographics and characteristics Full size table

To study the relationship between DNA methylation and breast cancer risk factors we applied both unadjusted and cell-type-adjusted linear models for microarray (limma) to examine the influence of subject age, BMI, and parity on the DNA methylome. Since the estimated cellular proportions for each sample sum to nearly one, we included all but the estimated cell-type with the smallest proportion to avoid multi-collinearity in our models. In a multivariable limma model adjusted for differences in cellular mixtures, 787 CpG sites were significantly associated with age, 0 CpG sites were associated with BMI, and 0 CpG sites were associated with parity, after correcting for multiple hypothesis testing (Q < 0.01, Fig. 1a). The full list of 787 CpG sites with genome annotation and statistical results is presented in Additional file 3. Notably, age-related DNA methylation alterations were predominantly hypermethylation events, i.e., increased DNA methylation was associated with increased age (545 CpG sites, 69.3%). To assess the impact that adjusting for cellular proportions had on the identification of significant associations and effect sizes, we computed the difference between the coefficients (i.e., a delta coefficient value) at each CpG for the models unadjusted and adjusted for cell type. A large CpG-specific delta value provides evidence for associations between DNA methylation and risk factors that may be most confounded by differences in cellular proportions. Visualization of CpG-specific P values and coefficients from cell-type unadjusted and adjusted models demonstrated that adjustment attenuated both the strength and magnitude of CpG-specific associations genome-wide (Additional file 4). Moreover, the number of significant associations (Q < 0.01) in the unadjusted limma model for subject age was 4099 CpG sites compared with 787 from the adjusted model, suggesting that a large number of false-positives are likely to be reported when differences in cell proportions are not considered (Additional file 4A-C). In addition, at the age-related CpG sites (n = 787, Q < 0.01) the DNA methylation patterns across purified cell populations of myoepthial cells, luminal cells, and adipocytes were consistent, suggesting that age-related changes may occur largely independent of tissue type in the normal human breast (Fig. 1b).

Fig. 1 Subject age is strongly associated with DNA methylation in normal breast tissue independent of cell type. a In the volcano plot, each point represents the associations between DNA methylation and age from cell-type-adjusted multivariable linear models for microarray data (limma) at individual cytosine-guanine dinucleotide (CpG) sites. Increasing -log10 (P value) values on the y-axis show increasing statistical significance and limma effect size on the x-axis positioned away from the zero value reveal the largest DNA methylation changes with age. Significant CpG sites are indicated in red (Q value < 0.01). The gene and gene regions are presented for the five CpG sites with the greatest significance. b Unsupervised clustering of DNA methylation values at age-related CpG sites (Komen, n = 100) visualized alongside CpGs measured in specific cell-types form the Roadmap to Epigenomics data set (n = 691 CpG sites). Each column represents a given tissue sample and each CpG is presented in rows Full size image

There were missing data on family history in 10 individuals in the present data set. To explore whether family history was associated with DNA methylation differences we applied the aforementioned limma approach unadjusted and adjusted for cellular proportions (n = 90), and found no significant associations (Q > 0.01) between family history and DNA methylation differences after correcting for multiple comparisons (Additional file 4D).

Independent validation of age-associated methylation

We next moved to validate our age-related DNA methylation findings in two independent 450 K data sets from 97 normal adjacent-to-tumor breast samples (TCGA) and 18 normal breast tissues from disease-free women (NDRI, GSE74214). Subject demographics and characteristics for these two data sets are presented in Table 2. In a reference-free cell-mixture-adjusted limma restricted to the 787 CpG sites identified in the discovery (Komen) population we observed that 548 CpG sites (TCGA, 69.4%) were differentially methylated in a direction consistent with the discovery population at a nominal P value <0.05 (Additional file 5A). Similarly, we observed highly consistent results in the NDRI population (389 out of 787 CpG sites, 49.4%) (Additional file 5B). Strikingly, there were 345 CpG sites (43.8%) in the TCGA data set and 109 CpGs (13.9%) in the smaller NDRI data set that were considered significant at the stringent Bonferroni threshold for multiple comparisons (Additional file 5A and B, P < 6.4E-05). In both validation cohorts, putative cell-mixture proportions were significantly associated with subject age (permutation P < 0.05) (Additional files 5C-D).

Table 2 Independent population subject characteristics Full size table

While it is appreciated that DNA methylation can modify chromatin structure and distally regulate the transcriptome, the most well-defined function of DNA methylation is the cis-regulation of gene transcription [31]. In the present study, sample-matched RNA-sequencing data were available only for a subset of the subjects from the TCGA data set (n = 88). Many of the age-related CpG sites that localize to gene regions (n = 630 CpG sites) demonstrated strong associations with gene expression (259 CpG sites at P < 0.05, Additional file 6A). The direction of the CpG-gene correlations demonstrated a dependency upon genomic context (Additional file 6B). For example, CpG sites tended to be negatively correlated in the promoter region, while there was an even distribution of positive and negative correlation in the gene body (that is, intron and exon) regions (Additional file 6B).

Age-associated DNA methylation sites are enriched for regulatory regions

To provide a broader biological interpretation of age-related DNA methylation we next sought to identify enrichment of these genomic locations in gene regulatory regions, such as tissue-specific histone marks and transcription factor binding sites (TFBS). First, we employed the eFORGE tool to identify cell-type-specific signals in diverse tissues profiled by the Roadmap to Epigenomics Consortium. We observed robust enrichment of H3K4me1, histone modifications that mark enhancers, in both fetal tissues and mammary epithelial cells (Q < 1.9E-37), and modest associations with other histone modifications (i.e., H3K4me3, H3K27me3) (Additional file 7). Fisher’s exact test confirmed that age-related CpGs localize to enhancer elements specifically in mammary myoepithelial cells (H3K4me1, Roadmap) (OR = 2.00 CI (1.73–2.33), P = 7.1E-20). We next used the genomic coordinates of age-related CpGs as a query set against the background of the 450 K array in LOLA scanning for enrichments of TFBSs. Since hypermethylation events are likely to be biologically distinct from hypomethylation events at TFBS we stratified our LOLA into a hypermethylation and a hypomethylation enrichment analysis (Fig. 2a and b). In the hypermethylation analysis, we observed a striking number of significant enrichments for CpG sites that were hypermethylated with age (14 TFBS, Q < 0.01) and hypomethylated with age (8 TFBS, Q < 0.01) (Additional file 8A and B). Among several of the top-ranking results presented in Fig. 2a, MYC and CTCF, which are critical regulators of chromatin architecture were enriched among hypermethylated CpG sites, while hypomethylated CpGs localize to binding sites of transcriptional activators c-Fos and Stat-3 [32,33,34,35].

Fig. 2 Age-related DNA methylation is enriched for regions of chromatin remodeling and transcriptional control. Cytosine-guanine dinucleotide (CpG) sites hypermethylated with age (a) and CpG sites hypomethylated with age (b) are highly enriched at the binding sites of transcription factors Full size image

Accelerated epigenetic aging of human breast tissue

It has been recognized that DNA methylation patterns change in a tissue-specific manner as an individual ages [29]. Previous studies have found that measurements of DNA methylation have the ability to accurately estimate an individual’s age and that observed differences between predicted DNA methylation age (that is, biological age) and chronological age are associated with disease-risk factors [29, 30, 36, 37]. Further, it has been observed that DNA methylation age predictions in the human breast demonstrate age acceleration when compared with other tissues, suggesting that normal breast tissue tends to age more quickly than other tissues [29].

To examine whether the subject-specific differences between biological and chronological age (that is, age acceleration) are associated with breast cancer risk factors we first calculated DNA methylation age from the 100 Komen normal breast tissue samples using two distinct epigenetic clocks [29, 30]. Briefly, the “Horvath epigenetic clock” uses elastic net regression to integrate DNA methylation information from 353 CpG sites to generate a multi-tissue age predictor. The second method, “epiTOC”, is an epigenetic clock that incorporates prior biological knowledge into a mathematical model to generate an estimate of mitotic divisions using 385 CpG sites. Notably, there was limited overlap between the 787 age-related CpGs and Horvath (17 CpGs) and EpiTOC (3 CpGs). In analyses with the Horvath clock, we observed strong positive correlation between chronological age and the DNA methylation age of the Komen breast tissues, with a Spearman correlation coefficient of 0.95 (P = 2.83E-52, Fig. 3a). In univariate analyses of age acceleration, defined as the residual resulting from regressing DNA methylation age (Horvath clock) on chronological age, and the cancer risk factors listed in Table 1, we observed a significant positive association only with race (African American, n = 5 subjects, P = 3.5E-02). Age acceleration was not associated with any other of the evaluated risk factors (P > 0.05). In a multivariate model considering all measured cancer risk factors, we found that race was significantly associated with increased epigenetic aging (African American P = 4.9E-02). In contrast to the Horvath clock, there was no significant correlation between chronological age and epiTOC-predicted age (P = 7.5E-01, Fig. 3b). Nonetheless, the epiTOC estimated biological age was also positively associated with race in univariate analyses (African American P = 2.1E-02, Hispanic P = 2.8E-02) and in multivariate models including all risk factors shown in Table 1 (African American P = 2.7E-02, Hispanic P = 2.7E-02). The remaining breast cancer risk factors were not associated with epiTOC-defined biological aging in either univariate or multivariate models (P > 0.05).

Fig. 3 Relationship between epigenetic clocks and cancer risk factors. a The Horvath epigenetic clock age in normal breast tissue is highly correlated with subject age (P = 2.83E-52). Age acceleration was significantly (P < 0.05) larger in African American women. b DNA methylation age as generated by the epigenetic timer of cancer (epiTOC) tool was not correlated with subject age in normal breast tissue (P > 0.05). Higher DNA methylation age was associated with subject race, as breast tissue from African American and Hispanic women demonstrated increased DNA methylation age (P < 0.05) Full size image

Age-related DNA methylation is further deregulated in pre-invasive and invasive breast cancer

To ascertain whether differences in DNA methylation in relation to disease risk factors are relevant for the development of cancer, we compared DNA methylation in breast tumors with adjacent normal tissue in both pre-invasive and invasive cancer, at the 787 age-related CpGs. In pre-invasive lesions (ductal carcinoma in situ, DCIS), there were 268 CpG sites among 775 CpGs available for measure (34.5%) that demonstrated differential methylation between DCIS and normal tissue using limma models adjusted for subject age Fig. 4a (P < 0.05). Importantly, changes at the age-related CpGs were greater (Additional file 9A and B) and demonstrated stronger associations than a randomly selected set of CpG sites with similar properties regarding their location within CpG islands Fig. 4b (Kolmogorov-Smirnov test, P = 3.0E-03). If the epigenetic defects in age-related DNA methylation are further deregulated in pre-invasive breast cancer it would be expected that progressive changes would occur in invasive breast cancer. To test this, we assessed differential methylation using limma models adjusted for subject age in TCGA breast cancer data set. A large proportion of the age-related CpGs exhibited significant differential DNA methylation changes in breast cancer (642 out of 787 CpGs (81.6%, P < 0.05)) (Fig. 4c). Again, we found that the age-related changes demonstrated greater DNA methylation differences (Additional file 9C and D) and stronger associations than a randomly selected set of CpGs with matching genomic distribution (Kolmogorov-Smirnov test, P = 1.1E-13) (Fig. 4d).