M. tuberculosis infection causes genome-wide DNA methylation changes in THP1 macrophages

In order to study the DNA methylation dynamics during host-pathogen interaction, genomic DNA was isolated from M. tuberculosis H37Rv infected (0 and 48 hrs) and corresponding uninfected PMA treated THP1 cells (THP1 macrophages). DNA for each time point (in duplicates) was pooled and enriched for methylated regions by MBD protein based affinity pull down. The enriched DNA fragments were sequenced on the Illumina Hi-seq NGS platform (Materials and Methods). THP1 cell line, a monocytic cell line, differentiates into macrophages upon treatment with the mitogen Phorbol myristate acetate ester (PMA)14. In order to eliminate the DNA methylation changes that would have resulted due to differentiation, peaks that were different between uninfected 0 and 48 hrs samples were first removed. The remaining peaks were compared with peaks from infected 48 hrs samples (see flow charts in Supplementary Fig. 1). A total of 23,433 differentially methylated regions (DMRs) were identified, of which 19,506 (~83%) DMRs were hypermethylated and 3,927 (~17%) were hypomethylated in the M. tuberculosis H37Rv infected THP1 macrophages (Fig. 1). This indicated that regions of hypermethylation were five-fold more enriched than hypomethylation in the genome of infected cells.

Figure 1: Genomic localization of the differentially methylated regions (DMRs) in M. tuberculosis infected THP1 macrophages. Pie charts (A) showing the distribution of DMRs (B) with respect to genes within the human genome. Hyper (left panel) and Hypo (right panel) DMRs were categorised as intergenic, promoter-specific and gene body specific transcription end site specific based on their location in the human genome. Hypermethylated DMRs represent genomic regions with gain of methylation while hypomethylated DMRs represent loss of methylation upon infection. Full size image

Genomic localization of the identified Differentially Methylated Regions

In order to assess the functional significance of the methylation differences arising due to M. tuberculosis infection, the distribution of these DMRs with reference to genic regions in the genome was examined. Approximately 50% of the hypermethylated DMRs (9830) were found to be associated with gene body (exons and introns), 48% (9308) were intergenic and 1% mapped to the promoter (203) and 1% to the transcription end site (TES, 165) (Fig. 1). In case of hypomethylated DMRs, ~46% (1792) and ~53% (2074) DMRs mapped to gene body and intergenic regions respectively while approximately 1% of DMRs were associated with promoter (34) and TES (27) (Fig. 1).

As 12946 intragenic DMRs (gene body, promoter and TES) were found to be associated with 7573 genes it indicated that multiple DMRs were associated with a single gene. Out of these 7573 genes, 4996 (3684 with hyper and 1312 with hypo-DMRs) were associated with a single DMR, 2555 (2278 with hyper and 277 with hypo-DMRs) with 2–10 DMRs and 22 (only hyper DMRs) with more than 10 DMRs (Table 1). Furthermore, out of the 7573 genes that were associated with a DMR, 2208 genes (listed in Supplementary Table 1) were found to be associated with both hypermethylated and hypomethylated DMRs. The epigenetic circuitry utilises DNA methylation to organize specific genetic loci into specific chromatin conformations15. The presence of both hypermethylated and hypomethylated DMRs within the same genetic loci could suggest multiple chromatin conformations within the same gene.

Table 1 Multiple DMRs associated with single genes. Full size table

Analysis of DMRs mapping to the gene body revealed that DMRs were predominantly associated with introns as 93% of these DMRs mapped to the introns while only 4–5% mapped to the exons. The remaining DMRs were associated with 5′ or 3′ UTRs (Table 2). Amongst the DMRs mapping to the introns, 40% of DMRs mapped to the first two introns while ~12% mapped to the last two intron of a gene, irrespective of the gene size (Table 2). This observation was similar to what has been observed for B-cell methylome where MRIs (Methylated Region of Interest) were found frequently at 5′ and 3′ ends of a gene16.

Table 2 Majority of the gene body DMRs are associated with introns. Full size table

Examination of the intergenic DMRs showed that approximately 60% of the intergenic DMRs were present within 70 Kb of a TSS (52% for hypo and 62% for hyper-DMRs, Fig. 2A). Apart from regulatory elements, non-genic regions are generally associated with repetitive DNA sequences and ncRNA genes17. To examine if some of the DMRs were associated with any specific repetitive element, the percentage of a specific repetitive DNA element associated with the DMRs was compared with the percentage of this repetitive element within the whole genome (Table 3, see Materials and Methods). While SINE elements constitute only around 13% of the total length of the repetitive DNA sequence in the human genome, this percentage was significantly higher at approximately 33% in relation to the DMRs (Table 3). No major difference was observed for any other repetitive DNA elements.

Figure 2: Characterization of DMRs. (A) Histogram showing distance of hyper (white bars) and hypo (black bars) DMRs from the transcription start site (TSS) of the nearest gene. Correlation of hyper (left panel) and hypo (right panel) DMRs with size (B), number of genes (C) and gene density (D) for each chromosome was plotted. Full size image

Table 3 Repetitive DNA element associated DMRs. Full size table

ncRNAs including lncRNA, miRNA, piRNA and snoRNA are known to be associated with gene regulation18. Recent reports have also shown differential expression of host miRNA in response to M. tuberculosis infection19,20,21.In our study, approximately 8% (1850) of the DMRs were associated with non-coding RNA genes. Majority (~90%) of these DMRs were found to be intergenic. As many of the ncRNAs are present within or in the vicinity of protein coding genes22, it was no surprise that more than half of these DMRs (954) mapped within other genes. 61% of the ncRNA DMRs were associated with miRNA genes (1124), ~24% with lncRNA (452), ~8% with piRNA (156) and 6.4% with snoRNA (118) genes (Table 4).

Table 4 ncRNA associated DMRs. Full size table

Chromosomal distribution of DMRs

In the human genome, distribution of genes on chromosomes has been found to be non-random and often in clusters. Studies have identified clusters of tissue-specific genes across different chromosomes23,24,25 that are co-expressed or co-regulated. To assess if these DMRs were preferentially associated with certain chromosomes and mirrored non-random chromosomal distribution of genes, distribution of the various DMRs on different chromosomes was examined (Supplementary Table 2).

While a positive correlation for both hypermethylated and hypomethylated DMRs was observed when compared to chromosome size as well as the number of genes per chromosome (Supplementary Table 2, Fig. 2B,C), no correlation was observed when gene density per megabase of chromosome was plotted against the number of DMRs, (Supplementary Table 2, Fig. 2D). The density of DMRs across the various chromosomes varied with chromosomes 17, 19 and 22 having high gene densities were found to have least number of DMRs. On the other hand, chromosome 1, 3 and 6 having low gene density had the highest number of DMRs (Fig. 3).

Figure 3: Identification of DMR hotspots. DMR density per Mb for each chromosome was calculated and plotted as histograms using Circos. The innermost circle displays the hypomethylated DMRs and the outermost circle displays the hypermethylated DMRs. Windows with more than 10 hypermethylated DMRs are shaded red and windows with more than 5 hypomethylated DMR are shaded green. DMR hotpsots within a chromosome are bound by a rectangle. The middle circle represents the ideogram. Bands drawn within the ideogram represent cytogenetic bands as are observed by Giemsa staining (data taken from UCSC Genome Browser). Full size image

To further investigate the observed enrichment (within chromosomes 1, 3 and 6) and depletion (within chromosomes 17, 19 and 22) of DMRs, we calculated DMR density using a 5 Mb sliding window with a 500 kb slide. A hypermethylation hotspot was defined as any 5 Mb window with average DMR density higher than 38.47 (average DMR density for chromosome 6), the highest average DMR density observed amongst the 24 chromosomes (supplementary Table 3). Based on these criterion, 23 hotspots present on 11 different chromosome were identified (Fig. 3, Table 5). Chromosome 1 had the largest number of hotspots (5 nos.). A region on the long arm of chromosome 6 (6q21-6q27) containing several immunologically important genes (including CCR6, TIAM2, ULBP2, LRP11, GPR126, FYN, NOX3) was found to have the highest DMR density (Table 5, Supplementary Table 4).

Table 5 Hypermethylation hotspots within the human genome linked to M. tuberculosis infection. Full size table

Functional classification of genes associated with DMRs

To examine if M. tuberculosis infection induced DNA methylation changes were associated with any specific genes or gene families, the genes associated with DMRs were classified using Gorilla Gene Ontology tool26 and the output was visualized using Revigo27. The gene families that showed significant enrichment (p < 0.05) included genes involved in signaling, cell communication, metabolism, transport, cell cycle, cytoskeleton reorganization, transcriptional regulation and chromatin modification (Fig. 4, Supplementary Table 5). DMR-associated immune response genes included genes corresponding to the HLA complex, cytokines, complement system.

Figure 4: Gene Ontology (GO) enrichment analysis of genes. GO analysis of genes associated with gene body DMRs present within promoter, gene body and TES visualized as a REViGO Interactive Graph. Full size image

The functional significance of genes found to be associated with regions of differential methylation in M. tuberculosis infected cells was also assessed by examining the interaction networks of the identified proteins. Using the web based PANTHER Gene Ontology tool28, we extracted the molecular function and biological process associated with each gene, followed by manual curation of the genes involved in immune response, chromatin modification, DNA replication and repair. The interaction network for the corresponding proteins was determined by the STRING search web tool29 and the interaction network was generated using clustering coefficient (Fig. 5). Most of the players involved in epigenetic reprogramming, were clustered in one sub-network. Interestingly, except for DNMT1 (a maintenance methyltransferase), all other DNA methyltransferase genes, DNMT3A, DNMT3B, DNMT3L, were found to have M. tuberculosis infection related DMR(s). SIN3A, a protein known to be associated with HDAC mediated repression of MHC class II proteins and found to be associated with HLA-DRα promoter in M. avium infected THP1 cells30, formed an important node connecting immune response with the epigenetic machinery (Fig. 5). Other epigenetic modulators like DNA demethylases TET2 and TET3, histone variant genes, INO80 complex, PRC2 complex genes, EED and SUZ12 were also associated with DNA methylation changes upon infection (Fig. 5).

Figure 5: Protein network analysis. Network analysis was performed on the set of genes found to be in the vicinity of DMRs identified in the MBD-seq. The interaction between proteins was generated using STRING (high confidence) and output was visualized using Cytoscape (Bottom left panel). Prominent nodes with their first neighbors are highlighted as a circular layout. The prominent node identified were SIN3A, ATM, PRKCA, IFNG, VAV3. Zoomed images represent circular layouts depicting genes involved in signaling (orange and purple), immunity (red), DNA repair (yellow) and chromatin organization (blue). Full size image

We next examined the function of genes present within the top five hotspots to assess whether any particular gene(s) or gene family associated more prominently with the infection-induced differential DNA methylation (Table 5). It was interesting to note that several immunologically important genes were present within the 5 hotpots, (especially in hotspot 1, 3 & 4, Fig. 6, supplementary Table 4). Hotspot 1 present on chromosome 6 included immunologically important genes like CCR6, TIAM2, ULBP2, LRP11, GPR126, FYN and NOX3 (Fig. 6). Hotspot 3 on chromosome 1 included ADAR, FCRL2, IL20, CD46, CD55, TNFSF4, IGFN1, IL19, TRAF5, etc, whereas hotspot 4 on chromosome 6 contained the HLA genes that are involved in antigen presentation apart from several other immunologically important genes (Fig. 6). The association of DMRs with histone genes (H2A gene cluster) was also noticeable in hotspot 3 on chromosome 1 (Fig. 6).

Figure 6: Circos plots for chromosome 1 and 6 highlighting DMR-associated features. Each DMR is plotted as a tile. The track immediately inside the ideogram represents hypermethylated DMRs, and the inner most track depicts hypomethylated DMRs. Hypermethylated and hypomethylated DMRs associated with the conserved sequence motif are colored red and green respectively. Regions within the chromosome identified as DMR hotspots are shaded red in the ideogram. Immune system related genes are labelled in red, ncRNA in blue and chromatin related genes in green. Hotspot cluster with histone genes is highlighted in green whereas the hotspots associated with HLA cluster on chromosome 6 and genes involved in immune response on chromosome 1 are highlighted in red. Histone and HLA genes that were not part of a hotspot are highlighted in grey. The heat map circle (second from inside) shows the gene density across the respective chromosome with regions of very high gene density as red, high as yellow, medium as green and low as grey. Full size image

DMRs are associated with a conserved motif

To examine if a common motif was associated with the identified DMRs, the sequences corresponding to hypermethylated DMRs were tested for the presence of any conserved motif using MEME tool31. Since several immune response genes were present in the top 5 hotspots the analysis was initially performed on all the immune response genes associated DMRs. A 28 base pair motif with a conserved ‘GCCTCC’ core was identified in these DMRs using MEME (Fig. 7A). Based on this observation, the complete set of 23,433 DMRs were scanned for the presence of this motif. This motif was found to occur 8646 times in 6646 out of the 19,506 hypermethylated and 1455 times in 1178 out of the 3,927 hypomethylated DMRs (Supplementary Tables 6 and 7) with some DMRs showing the presence of this motif at multiple positions. In 515 genes this motif was present in both hypermethylated and hypomethylated DMRs (Supplementary Table 8). The enrichment of this motif in the human genome was also calculated. Taking the number of bases covered into consideration, this particular motif was more than 100 fold enriched in the DMR data set as compared to the whole genome. Scanning of individual chromosomes revealed that chromosome 19 had the maximum density of this motif. However, no significant enrichment of this motif was observed with any DMR across the different chromosomes (Supplementary Table 9). To further examine the correlation of this motif with DMRs, the distance of this motif from the peak maxima was calculated (Supplementary Table 10). In majority of the DMRs, the motif was found to occur within 150 bp from the peak maxima (Fig. 7B) and in 16% (1037) of hypermethylated and 20% of hypomethylated DMRs, the peak maxima and the motif overlapped (distance of 14 bp from the center of the 28 bp motif to peak maxima).

Figure 7: Identification of a common motif within DMRs. (A) The 28 bp motif identified in the DMRs and represented as a logo. The conserved ‘GCCTCC’ core has been underlined. The motif was generated using MEME tool. (B) Histogram showing the distance of the motifs, present within individual DMRs, from the peak maxima. Note that the maximum distance from the peak maxima for most of the DMRs was equivalent to the length of DNA wrapped around one nucleosome. Hyper DMRs are denoted by white bars and hypo DMRs with black bars. Full size image

M. tuberculosis infection induces methylation of non-CpG cytosines in host DNA

To validate the MBD-seq data and confirm DNA methylation changes observed in THP1 macrophages upon infection with Mycobacterium tuberculosis, we performed bisulfite sequencing on few of the DMRs. These DMRs were chosen either randomly or based on the role of the DMR associated gene in host defense mechanism against M. tuberculosis infection. Bisulfite analysis of these regions revealed negligible difference in methylation of cytosines present in CpG dinucleotide context. In fact, majority of the CpGs were found to be methylated in both uninfected and infected THP1 macrophages for the regions analysed (Fig. 8A, Supplementary Fig. 2). Surprisingly, a significant difference was obtained in the methylation of cytosines in non-CpG context for all the regions that we tested (Fig. 8A, B). This was true for both gain and loss of DNA methylation and the combined non-CpG methylation difference between uninfected and infected THP1 macrophages was also statistically significant (Fig. 8C). Examination of the bisulfite data also indicated that while change in the levels of cytosine methylation was observed in all the non-CpG dinucleotides, CpT methylation change was the most prominent (Fig. 8B,C).

Figure 8: M. tuberculosis infection results in non-CpG methylation of the host genome. (A) DNA methylation analysis by bisulfite sequencing. Cytosine methylation profile for some of the hypermethylated (HR) and hypomethylated (HO) DMRs was examined by bisulfite sequencing of genomic DNA from uninfected (U) and M. tuberculosis infected (I) THP1 macrophages. Red circles represent CpG and squares represent non CpG dinucleotides. Filled symbols (grey or black) represent methylated cytosine. 10 or more clones per sample were analysed. (B) Statistical analysis of methylated non-CpG dinucleotides in hypermethylated (HR) and hypomethylated (HO) DMRs. Ratio of mC to total C per sample was plotted on Y-axis. HO2 - chr.3:28247576-28247639; HO4 - chr7:97993741-97993898; HR1 - chr6:167283490-167283654; HR4 - chr.3:162931192-162931429; HR10 - chr.5:39213638-39213737; HR11 - chr.1:164394535-164394664. (C) Percentage methylation of non-CpG dinucleotides. Proportion of methylated CpA, CpT and CpC dinucleotides was calculated with respect to total cytosines. The regions analysed are listed on the X-axis. Coordinates for each region are provided in Materials and Methods. Full size image

To confirm the genuineness of the observed non-CpG differential methylation by bisulfite analysis, methylation analysis was performed by MeDIP analysis on some of the DMRs examined by bisulfite sequencing. For this analysis, DNA was isolated from three different sets of THP1 macrophages infected with M. tuberculosis H37Rv. MeDIP results confirmed that the non-CpG methylation difference observed by bisulfite sequencing reflected the methylation difference that were observed initially by MBD-seq (Fig. 9A).

Figure 9: Infection associated DNA methylation changes are correlated with gene expression changes. (D) MeDIP validation of bisulfite sequencing based cytosine methylation enrichment of selected DMRs. The enrichment is represented as percentage input for each region. (E) Expression analysis by quantitative RT-PCR for the genes listed below the X-Axis. The level of expression in uninfected THP1 macrophages is shown by a horizontal line (UI level). The experiment was done at least thrice in duplicates. The error bars represent standard deviation (SD). * Indicate significant difference (Student’s t-test, *P < 0.05). Full size image

Altered DNA methylation associated with gene expression changes

In order to study the effect of infection-induced DNA methylation changes on the expression levels of DMR-associated genes upon infection, expression of 28 genes associated with one or more DMRs was examined by Real Time RT-PCR. RNA for this analysis was isolated from three different sets of M. tuberculosis H37Rv infected or uninfected THP1 macrophages. 19 out of the 28 genes tested showed change in gene expression. 7 of them (ADORA3, CCR6, DNMT3B, HDAC9, NEDD4L, PBX1, SOX5) showed a decrease in expression, 12 genes showed increase in expression (BAIAP2L1, BCL2L1, CXCR4, FYN, NOTCH2, PARP1, PRKCD, PRMT3, RNASET2, RPS6KA2, ZCWPW2, ZNF148) while for 9 genes expression levels did not change (ANKS6, CMC1, COL15A1, DNMT3A, FOXP2, GABBR2, HRH2, PIK3R1, VAV3) 48 hours post-infection (Fig. 9B).