Circadian oscillations of white blood cell fractions can simulate epigenetic oscillations

Blood samples (i.e., white blood cells (WBC)) are commonly used in molecular studies of human subjects due to their ease of access and relatively non-invasive collection procedure. Blood-based epigenomic analyses, however, can be confounded by WBC count differences across individuals and may generate false epigenetic effects [15]. Previous studies have shown that WBC counts oscillate in a circadian manner and the composition of cell types can change within an individual throughout the day [16]. We investigated WBC fractions collected every 3 h for at least 48 h from four male subjects and found that the total WBC count, as well as the number of different cell types in WBC, do indeed oscillate in a circadian manner (Fig. 2a; Additional file 1). Moreover, while absolute cell counts of neutrophils and lymphocytes oscillated in phase with the total blood count (Fig. 2a), their relative proportions were not uniform. For instance, lymphocytes were relatively enriched at around circadian time (CT) 6, while neutrophils were enriched at CT18 (Fig. 2b). This shows that failure to account for circadian cell count effects may simulate false epigenomic oscillations. Computational approaches [17] can be used to account for changes in cellular proportions but osc-modCs that correlate with cell counts may also be eliminated and result in a false negative outcome.

Fig. 2 Circadian oscillations of absolute and relative cell counts in human white blood cell types. a, b Dynamics of a absolute count of four WBC types, and b proportion of each cell type relative to the total WBC count in four male subjects. Values are mean-centered by cell type, and solid line type indicates oscillation significance (p < 0.05). For each of the four subjects, data points with the same time of day were averaged. The lines represent harmonic regression fits, and whiskers represent the 95% confidence intervals of the between-subject mean. Data points are shifted slightly along the x-axis from their integer values to avoid whisker collisions. CT, circadian time Full size image

In order to avoid these confounders, we performed a circadian epigenomic analysis on a pure population of WBC. For our experiment, we opted to investigate neutrophils, which are the largest WBC fraction and exhibit substantial cytosine modification variability in the general population [18]. Some degree of neutrophil heterogeneity may exist [19], but it is currently unknown if the epigenomes of neutrophil subtypes differ significantly. Given that neutrophil subtypes have not been clearly defined, we used bulk neutrophils for our experiment.

Mapping of modified cytosines reveals circadian patterns in human neutrophils

Blood samples were collected every 3 h over a 72-h period from a healthy 52-year-old male, who did not use any sleep-inducing medication, nor reported insomnia, hypersomnia, or other sleep disorders. We used magnetic bead-based antibody selection for separation of neutrophils and reached 98–99.5% (mean ± SD = 99.0 ± 0.56%) purity based on Houseman’s algorithm estimates of cell composition [17]. Cytosine modification profiles were interrogated using the Illumina HumanMethylation450 BeadChip. To reduce batch effects, every DNA sample was interrogated in technical duplicates. We selected cytosines whose biological modification signal significantly exceeded technical noise, which we refer to as epigenetically variable cytosines (EVCs; Additional file 2; see the “Methods” section for more detail).

Out of 485,512 interrogated cytosines, 466 (0.1%) were identified as EVCs after correction for multiple testing (false discovery rate (FDR) q < 0.05). A significant 24-h oscillation pattern (p < 0.05), estimated using the cosinor model [20], was detected in 73.18% of the FDR-significant EVCs (p = 8 × 10−4 after 10,000 permutations) (Fig. 3a). We found no evidence for the presence of other oscillation periods (Fig. 3b), suggesting that the 24-h period oscillations represent a predominant source of cytosine modification dynamics in neutrophils.

Fig. 3 Oscillation profiles within epigenetically variable cytosines. a Heatmap of all FDR-significant EVCs with rows representing independent cytosines and columns representing samples ordered by CT. Missing CTs (CT31 and CT52) were replaced with the mean values of their two nearest neighbors. All cytosines were standardized using a z-score transformation. Hierarchical clustering with correlation distance grouped the cytosines into three clusters. Cosinor model fit on the averaged sample values is depicted below each cluster along with the cosinor p value. b Periodogram showing percentages of oscillating FDR EVCs using various oscillation periods. c Histogram of oscillation p values of nominally significant EVCs. d Percentage of oscillating nominally significant EVCs in 10,000 permutations of CT labels. The red line shows the observed percentage of oscillating cytosines in the unshuffled data. e Distribution of acrophases across significantly oscillating nominally significant EVCs. The gray shaded area indicates dark hours. f Average mesor values for osc-modCs peaking during light hours (red) and dark hours (blue). Shaded areas depict the 95% confidence interval for the mesor means. For illustration purposes, the mesor values were depicted using the average oscillation pattern within each group. CT, circadian time; modCs, modified cytosines; EVCs, epigenetically variable cytosines Full size image

In order to capture more osc-modCs across the genome, we relaxed the EVC filtering threshold to nominal significance (ANOVA p < 0.05), which increased the subset to 38,410 (7.91%) cytosines. For every EVC, oscillation parameters, such as amplitude, acrophase (time of the oscillation peak), and mesor (rhythm adjusted mean), were estimated using a cosinor model [20] with a fixed 24-h period. We found that 8.55% of the EVCs (3238 of 38,410; permutation p = 0.045) showed significant oscillation (Fig. 3c, d), with a mean amplitude of 1.72% (range 0.09–8.32%). The majority of osc-modCs had acrophases between CT4 and CT6, with a smaller cluster positioned between CT15 and CT18 (Fig. 3e). These findings are in line with the results of our mouse study where oscillation acrophases had a bimodal distribution, roughly 12 h apart [14]. The bimodality of acrophases in both mouse and human tissues indicates that, at any given time, circadian modification effects are bi-directional; that is, some cytosines become methylated, while others are demethylated.

Light (CT8–20) and dark (CT20–8) hour acrophases also showed bimodality of the average modification density. Osc-modCs peaking during dark hours predominantly had low levels of modification (average mesor 20.3% [19.1–21.6%]), while osc-modCs with light acrophases were more heavily modified (average mesor 53.3% [50.9–55.7%]) (Fig. 3f). Similar to the mouse findings [14], the two patterns of human cell oscillations demonstrated cyclical divergence and convergence of cytosine modification densities, which we dubbed as epigenetic “apogee” (i.e., distance between the two sinusoidal curves reached their maximum) and “perigee” (i.e., distance between the two sinusoidal curves reached their minimum).

We examined cytosine modification densities of seven WBC types (myeloid lineage: neutrophil, monocytes, eosinophils; lymphoid lineage: B cells, NK cells, CD4+ T cells, and CD8+ T cells) from a public dataset composed of 6 unrelated blood donors [21]. Although all WBC types had similar overall cytosine modification profiles (Fig. 4a), at the counterpart positions to neutrophil osc-modCs, lymphoid cells had vastly higher modification densities compared to myeloid cells (Fig. 4b). Pairwise comparisons of the WBC types showed that differentially modified positions were associated with neutrophil osc-modC sites (Fig. 4c). Although the strongest overlaps were detected within myeloid lineage cells, neutrophil osc-modCs also overlapped with loci that were differentially modified within lymphoid lineage cells (e.g., B cells vs. CD4+ T cells). Assuming that our observations are not driven by an unknown neutrophil subtype heterogeneity, this finding suggests that osc-modCs may not be limited to neutrophils and that epigenomic oscillations may be involved in blood cell differentiation.

Fig. 4 Association between oscillating cytosines and modification differences in cell lineages. a, b Cytosine modification densities of seven purified white blood cells within a all measured cytosines and b cytosines that were identified as oscillating in neutrophils. c Overlaps between neutrophil osc-modCs and differentially modified cytosine positions in various pairs of blood cell fractions. The size of the circles depicts log 2 odds ratio of the overlap, and the shading represents the number of FDR-significant modification differences detected between pairwise comparisons of cell types. modC, modified cytosine Full size image

Osc-modCs contribute to both intra- and inter-individual epigenetic variation

This neutrophil dataset from a single individual is not confounded by the effects of external environment and DNA variation, which allowed us to explore the contribution of osc-modCs to intra-individual epigenetic variability. As expected in the presence of true oscillations, we found that the proportion of intra-individual variance explained by osc-modCs increased with more stringent EVC selection threshold (Additional file 3). For instance, at ANOVA p < 0.05, 8.5% of EVCs were identified as osc-modCs and explained 8.5% of the variance (cosinor p = 3.1 × 10−3) reflected in the third principal component (PC). At FDR q < 0.05, however, 73.2% of the EVCs were found to be osc-modCs and explained 53.1% of the variance (cosinor p = 9.4 × 10−4) in the first PC. Traditionally, in the absence of a time dimension, EVCs would have been deemed stochastic. However, our data shows that the most dynamic parts of the epigenome over the duration of a day, within an individual, can be substantially attributed to osc-modCs.

Next, we examined the distribution of osc-modCs across various genomic elements and made three observations. First, sequences surrounding osc-modCs were enriched for canonical (CANNTG) and non-canonical (CANNNTG) E-box response element motifs (e value = 1.0 × 10−23–6.7 × 10−60) (Additional files 4 and 5), which play a key role in the regulation of circadian transcripts [22, 23]. We also identified enrichment of transcription factor motifs related to cellular differentiation and development (e.g., forkhead box (FOX), Fos-related, Jun-related, and Krüppel-related factors) [24,25,26], as well as immunity (e.g., interferon-regulatory factors) [27]. Secondly, oscillating cytosines were highly overrepresented in neutrophil-specific enhancer regions [28] (OR = 11.7 [8.0–16.5]; p = 9.80 × 10−25) (Fig. 5a). Lastly, osc-modCs were underrepresented at CpG islands and shores (0–2-kb region outside CpG island) but overrepresented in the shelves (2–4-kb region outside CpG island) and seas (regions farther than 4 kb from CpG island) (Fig. 5a). Contribution of osc-modCs to epigenetic variance was different across these regions. For instance, the average amplitude of oscillations was lower in CpG islands compared to the seas (average amplitude difference = 0.7%, t test p = 1.19 × 10−44) (Fig. 5b). Relatedly, oscillating cytosines were depleted within and near transcription starting sites but were enriched within gene bodies (Fig. 5a). Therefore, our data consistently indicates that osc-modCs may be important in regulating gene transcription.

Fig. 5 Relationships between osc-modCs and genomic elements. a Odds ratios of overlap between osc-modCs and various genomic elements estimated using a two-sided Fisher’s exact test. Full circles mark the log 2 odds ratios, and extending bars represent 95% confidence intervals. b Box plots showing the distribution of osc-modC amplitudes in relation to genomic regions. modC, modified cytosine Full size image

Low cytosine modification variability in CpG islands, as well as increased variability in the more distal elements, has also been detected in cross-sectional studies of human epigenomes [29, 30]. To investigate the links between inter- and intra-individual variations, we used a publicly available list of cytosines exhibiting high degree of modification variability from a populational neutrophil dataset [18] and detected a strong overlap with osc-modCs (OR = 15.3 [12.3–19.0]; p = 2.52 × 10−75). Consistent with our osc-modC findings, the populational neutrophil sample exhibited depletion of epigenetic variability in CpG islands and proximal sites, while enrichment was observed in distal regions and enhancers [18]. Furthermore, motif enrichment analysis on sequences flanking populational neutrophil hyper-variable CpGs showed an enrichment for non-canonical E-box motifs (e value = 1.5 × 10−22) and Krüppel-related factors (e value = 1.2 × 10−58) (Additional files 6 and 7).

To further explore putative roles of osc-modCs in inter-individual variability, we re-analyzed two large whole blood datasets [31, 32], which were adjusted for white blood cell count differences, as well as known demographic, clinical, and technical covariates. The residual variation of modCs showed strong association with osc-modCs (logistic regression p = 1.80 × 10−3 and 4.36 × 10−11, for Hannum et al. and Hannon et al. datasets, respectively). Taken together, these findings suggest that, in addition to DNA sequence variation, non-shared environment, and stochasticity [33], osc-modCs may also contribute to the inter-individual variations of cytosine modifications.

Osc-modCs are associated with aging

In our previous mouse experiments, we detected that osc-modCs were associated with linear age-dependent cytosine modification changes [14]. In this study, we found that human neutrophil osc-modCs were also associated with age-correlated cytosine modification (age-modCs) in two whole blood datasets [31, 32] corrected for cell composition differences and various biological and technical covariates (OR = 1.53 [1.39–1.68]; p = 4.5 × 10−17 and OR = 1.39 [1.19–1.63]; p = 6.6 × 10−5, respectively). Like in the mouse liver and lung tissues, circadian amplitudes of human neutrophils correlated with the magnitude of epigenetic aging effects (Fig. 6a, b, Spearman’s rho = 0.11, p = 1.2 × 10−2 and rho = 0.23, p = 2.9 × 10−3). Finally, we replicated the observation that the time of osc-modC acrophase can predict the trend of epigenetic aging (but in opposite direction from mice that are nocturnal); cytosines with light hour acrophases were prone to accumulation of modified cytosines with age (OR = 1.88 [1.19–2.97]; p = 4.3 × 10−3 and OR = 2.47 [1.09–5.59]; p = 1.8 × 10−2)

Fig. 6 Association between osc-modCs and aging. a, b Scatterplot showing the relationship between aging magnitude and oscillation amplitude in two populational studies: a Hannum et al. (GSE40279) and b Hannon et al. (GSE80417). Black lines indicate fitted least squares regression lines with shaded gray area depicting 95% confidence intervals. Results produced using the subset of cytosines that exhibited both oscillating and aging effects. One outlier cytosine (Illumina probe ID “cg22454769”) was excluded from both figures for data visualization purposes. osc-modC, oscillating modified cytosine Full size image

Osc-modCs are associated with complex diseases

Osc-modC’s involvement in cellular differentiation, epigenetic variation, and age-dependent epigenetic changes prompted us to investigate the roles of osc-modCs in complex diseases. We selected three different groups of diseases that represent major human pathological processes: malignancy (leukemia), neurodevelopmental dysfunction (schizophrenia), and metabolic dysregulation (obesity and type II diabetes).

We first investigated cytosine modification findings in chronic lymphocytic leukemia (CLL) [34], which is classified into unmutated (uCLL) and mutated (mCLL) based on the mutation status of the immunoglobulin heavy chain variable gene segment. We found that neutrophil osc-modCs were significantly overrepresented among differentially modified cytosines in B cells from both uCLL (OR = 1.96; p = 2.4 × 10−31) and mCLL (OR = 2.73; p = 1.6 × 10−15). Next, we analyzed three large blood-based EWAS, two of schizophrenia [32, 35] and one of body mass index (BMI) [36], and again detected significant overlaps between osc-modCs and EWAS hits from these three studies (Fig. 7a).

Fig. 7 Association between disease and oscillating cytosine modifications. a Odds ratios of overlap between osc-modCs and differentially modified loci in various disease datasets estimated using Fisher’s exact test. Full circles mark log 2 odds ratios, and whiskers represent 95% confidence intervals. b Scatterplot showing the association between the osc-modC amplitude and log-transformed odds ratio for type II diabetes risk. The black line indicates fitted least squares regression line with shaded gray area depicting 95% confidence interval. c Box plots of cytosine modification differences in simulated and a representative EWAS hit, cg10311104, from schizophrenia EWAS ([32]; Supplementary Fig. S5). Black dots represent outlier samples beyond the interquartile range. d Ten representative samples from osc-modC simulation in the “control” and the “patient” groups. Gray boxes represent the regular “office hours” (9 AM–5 PM) when samples are usually collected in a realistic clinical setting. Black curves represent the oscillation profiles for each sample, with red dots indicating a randomly selected sample collection time. CLL, chronic lymphocytic leukemia; C.I., confidence interval; modC, modified cytosine; CT, circadian time; CTRL, control; SCZ, schizophrenia; OBS, observation Full size image

In order to uncover the direction of association between osc-modCs and disease, we utilized results from a prospective type II diabetes EWAS [37]. In this study, only seven cytosines were identified to predict type II diabetes at EWAS significance. Hence, we investigated cytosines with nominal p < 0.05 and found that osc-modCs were overrepresented in this group (OR = 1.24 [1.08–1.43]; p = 2.2 × 10−3). Interestingly, the magnitude of cytosine modification changes contributing to type II diabetes risk correlated with the osc-modC amplitudes (Spearman’s rho = 0.24, p = 2.4 × 10−4) (Fig. 7b), suggesting that osc-modCs can identify cytosine positions and magnitude of changes that are involved in type II diabetes. These findings are consistent with the causal interpretation; however, caution is necessary since type II diabetes is comorbid with obesity, and this association may reflect obesity-induced osc-modCs, rather than osc-modCs predisposing to diabetes.

It is important to note that all of the above comparisons were performed using data from mismatched cells (e.g., neutrophil osc-modC vs. B cells in leukemia) or non-primary targets of the disease (e.g., blood instead of neurons for schizophrenia). It is possible that associations between oscillating epigenetic factors and disease epimutations would be stronger if matching cell types were analyzed together.

Disease EWAS hits may be cross-sectional “snapshots” of aberrant osc-modCs

Disease EWAS findings typically exhibit two properties. First, cytosine modification differences between affected individuals and controls, despite statistical significance, are very small. For example, the two large schizophrenia EWAS found mean absolute cytosine modification differences between cases and controls to be only 0.7% [35] and 1.3% [32]. Similarly, type II diabetes EWAS [37] found mean absolute differences in cytosine modification densities ranging from 0.5 to 1.1%. Second, patients quite often exhibit higher variability of cytosine modification compared to the controls [30, 38, 39]. In some studies (e.g., type I diabetes EWAS), enrichment of differentially variable cytosines in affected individuals compared to controls was the only statistically significant finding related to disease [29].

To investigate if these two groups of findings can be explained by the circadian epigenetic dysfunction in disease, we generated a simulated dataset with the following criteria: (1) Sample size was matched with previous schizophrenia EWAS [32]: 353 “patients” and 322 “controls”. (2) Amplitudes were randomly selected from a range of 3–10%. (3) We assumed that “controls” had conserved osc-modCs, while “patients” had disturbed circadian regulation. As such, oscillation period for “controls” was set to 24 h, while “patients” had 90% of samples with a 24 h period, 8% of samples with randomly selected periods ranging from 24 to 120 h, and 2% of non-oscillating samples. (4) Relatedly, acrophases for “controls” were randomly shifted within a 3-h interval (i.e., conserved), while “patient” acrophases were distributed across a wider interval of 9 h. (5) It was assumed that sample collection was performed within an 8-h time window, corresponding to regular working hours at a clinical setting (Fig. 7c, d).

The simulated dataset reproduced both of the common EWAS properties: a small but significant effect size (absolute mean difference = 0.85%; t test p = 1.92 × 10−13) and higher variance in “patients” compared to “controls” (variance ratio = 2.80; F test p = 4.8 × 10−20) (Fig. 7c). This suggests that single time recordings in the cross-sectional sampling traditionally used in EWAS may represent “snapshots” of aberrant circadian cytosine modifications and highlights the necessity of sample collection and analysis to be performed in a circadian-sensitive manner.