To understand the variability of DNA methylation across human tissues better, we obtained post-mortem samples of 18 tissue types from 4 individuals (5 singletons, 8 duplicates and 5 triplicates; Fig. 1a, Supplementary Methods and Supplementary Table 1) and performed deep transcriptome (36 messenger-RNA-seq samples; 120–475 million reads per sample), base-resolution methylome (36 MethylC-seq4 samples; 30–80× genome coverage per sample), and genome sequencing (4 whole genome sequences; 20–45× genome coverage per sample). We focused our initial analysis on cytosines in the CG context and used a previously published method2 to identify differential methylation (Supplementary Methods). We found that 15.4% (4,073,896 out of 26,474,560 sites tested) of CG sites in these experiments are strongly differentially methylated (minimum methylation difference ≥ 0.3; Extended Data Fig. 1a), which is similar to a previous study2. To identify differentially methylated regions (DMRs), we combined sites within 500 base pairs (bp) of one another and found 1,198,132 DMRs. Even with these stringent criteria, 719,837 (60.1%) of the DMRs we identified were novel2,5.

Figure 1: The methylomes and transcriptomes of human tissues. a, The tissues analysed in this study. Samples are denoted by the two letter code in parentheses followed by an individual ID. b, Browser screenshot of an example DMR. The top track contains gene models. The following four tracks contain green blocks indicating the location of super enhancers, enhancers and hypomethylated DMRs in the aorta, respectively. The remaining tracks display methylation data from each sample. Gold ticks are CG sites with heights proportional to their methylation level. Ticks on the forward and reverse strand are projected upward and downward from the dotted line, respectively. c, d, Hierarchical clustering of DMR methylation levels (c) and expression levels of differentially expressed genes (d). Colours indicate the organ systems each sample belongs to. PowerPoint slide Full size image

As expected, hypomethylation at DMRs correlated with tissue-specific functions2,6. For example, strongly hypomethylated DMRs in the aorta overlap with aorta-specific super enhancers7 around MYH10, a gene involved in blood vessel function8 (Fig. 1b). To validate our DMRs further, we performed hierarchical clustering on their weighted methylation levels9 (Supplementary Methods, Fig. 1c and Extended Data Fig. 1b, c). Tissues that were part of the same organ system clustered together (for example, heart and muscle tissues). We compared these results to a clustering of differentially expressed genes identified in the transcriptomes and found a similar separation of organ systems (Supplementary Methods, Fig. 1d and Extended Data Fig. 1d). Furthermore, Genomic Regions Enrichment of Annotations Tool10 analysis on the most hypomethylated tissue-specific DMRs revealed many tissue-specific functions (Extended Data Fig. 1e, f, Supplementary Methods and Supplementary Tables 2,3).

To examine the relationship between methylation and transcription, we correlated the methylation levels of DMRs and the expression of the closest genes (Fig. 2a, Extended Data Fig. 2a, b and Supplementary Methods). As expected, methylation in DMRs had a negative correlation with expression, and this correlation grew stronger closer to the transcription start site. The strongest negative correlation was not in gene promoters but downstream of the promoter up to 8 kilobases (kb) away (intragenic (0.3 kb to 8 kb) versus promoter region and upstream region (−2 kb to 0.3 kb) median Spearman correlation coefficient difference −0.07; Mann–Whitney P = 4.2 × 10−17; Fig. 2a). This analysis shows that transcription is strongly associated with intragenic DMRs in the tissues we examined, extending similar observations in cancer methylomes11.

Figure 2: DNA methylation and its relationship with gene expression. a, The mean Spearman correlation coefficient at various distances between the methylation level of autosomal DMRs and the expression of the nearest gene. These correlations are shown for DMRs: overlapping genes (gene body), overlapping enhancers, overlapping promoters or CpG islands (CGIs) or CGI shores, not overlapping genes (intergenic) and all remaining DMRs (undefined). TSS, transcription start site. b, Heatmap showing the tissue-specific methylation preference of each motif. The tissues are coloured according to Fig. 1c, and the ordering is listed at the bottom of the figure. The bar plot on the right shows the number of times the motif was present in the 20 motif models. c, The number of base pairs covered by PMDs in all samples. d, The distribution of expression inside and outside of PA-2 PMDs across various samples. Notches indicate a confidence interval estimated from 1,000 bootstrap samples. Each PMD boxplot consists of 3,627 genes, and each non-PMD boxplot consists of 22,907 genes. FPKM, fragments per kilobase of transcript per million mapped reads. e, f, Histone modification profiles in and around PMDs in PA-2 (e) and IMR90 (f). PowerPoint slide Full size image

These intragenic methylation differences have previously been suggested to mark intragenic CG islands (CGIs) or CGI shores5,12,13,14. However, only a small fraction of intragenic DMRs fell in these features (19%; Extended Data Fig. 2c). In addition, predicted enhancers and putative promoters only accounted for 23% and 22% of intragenic DMRs, respectively, suggesting that the remaining DMRs, which we call undefined intragenic DMRs (uiDMRs), represent an unrecognized set of functional elements (35%; Extended Data Fig. 2c and Supplementary Methods). The methylation level of these uiDMRs correlated strongly with the expression of the genes containing them. To examine their regulatory potential, we plotted their histone modification profiles (histone 3 Lys 4 methylation (H3K4me1), H3K4me3, H3K27ac, H3K9me3, H3k27me3 and H3K36me3) derived from the same tissue samples15 and found five classes: weak enhancer, promoter-proximal, transcribed, poised enhancer and unmarked (Extended Data Figs 2d–h, 3a, b and Supplementary Methods). Classes with strong, active histone modifications were moderately negatively correlated with expression (weak enhancer and proximal promoter uiDMRs; median Spearman correlation coefficient −0.32 and −0.16, respectively); whereas, uiDMRs with less active histone modifications exhibited a weak negative correlation (transcribed and poised enhancer uiDMRs). Notably, the correlation between expression and methylation at promoter-proximal uiDMRs was as strong as the correlation with intragenic DMRs that overlapped strong promoters (Extended Data Fig. 4 and Supplementary Methods), indicating that intragenic promoter and promoter-proximal sequences are more predictive of changes in methylation than those enriched for enhancer-like chromatin modifications.

By contrast, unmarked uiDMRs showed a weakly positive correlation with expression (Extended Data Fig. 4d). Notably, we found many of the motifs enriched in tissue-specific uiDMRs were present in tissue-specific enhancers (for example, HNF4a (ref. 16) in liver-specific uiDMRs), suggesting that these DMRs are tissue-specific regulatory elements (Supplementary Methods and Supplementary Tables 4 and 5). Recently, hypomethylated regions that appear inactive in adult tissues but active during fetal development were identified in mice6. We examined the DNase I hypersensitivity profiles of unmarked uiDMRs in matched fetal tissues17 and found an enrichment of hypersensitivity (Extended Data Fig. 5 and Supplementary Table 6), suggesting that hypomethylation of inactive DMRs can be maintained at regions active earlier in development.

We next examined whether variation in methylation is associated with genetic variation across individuals, which has not been widely characterized in healthy primary tissues or using whole-genome bisulphite sequencing18,19.To identify individual-specific DMRs, we used a method20 that is sensitive to these differences unlike the methodology used above (Supplementary Methods). We first restricted our analysis to triplicated samples and ranked DMRs by a tissue-specific methylation outlier score that is largest when the methylation level in one individual differs from the other two. We found an ∼1.6-fold enrichment of single nucleotide polymorphisms (SNPs) associating with methylation changes in the top 2,500 methylation-outlier-score-ranked DMRs in all tissues (Supplementary Methods). We then used the Epigram pipeline21 to predict tissue-specific methylation from DNA motifs in these DMRs and found them highly predictive (average area under the curve (AUC) 0.79; Supplementary Methods). These full models used an average of 156 motifs; however, an average AUC of 0.74 was achieved using only 20 core transcription factor motifs per tissue.

We then identified groups of corresponding motifs by clustering the sets of tissue-specific motifs (Supplementary Methods). The motif groups were clustered by their tissue hypo- and hypermethylation specificities (Fig. 2b). In total, 42 out of 95 motifs only had hypomethylation specificity; for example, MEIS, which is involved in heart development22, is hypomethylated in the left ventricle, right atrium and right ventricle. We also identified 34 motifs enriched at both hypo-methylated DMRs in some tissues, and in hyper-methylated DMRs in some other tissues. Three of these motifs match transcription factor families (FOX, HOX and GATA) and are most significantly enriched in hypomethylated regions, suggesting that they are primarily involved in regulating hypomethylation.

Mammalian cells have high genome-wide levels of mCG, with the exception of a cultured human fetal fibroblast cell line (IMR90)4, cancer cells23,24 and placenta (PLA)25. Surprisingly, large regions of the pancreatic methylomes (PA-2 and PA-3) were significantly hypomethylated (Extended Data Fig. 6a). We developed a method to identify partially methylated domains (PMDs) genome-wide (Supplementary Tables 7,8 and Supplementary Methods) and found pancreatic PMDs were smaller than those in IMR90 and PLA (Extended Data Fig. 6b) and covered a smaller fraction of the genome (Fig. 2c). All pairs of PMDs overlapped significantly, indicating that these regions are largely shared (>40% overlap; P < 0.001; Extended Data Fig. 6c).

Genes in samples with PMDs are transcriptionally repressed25,26, but these regions also show reduced expression in all of the tissues we surveyed whether or not a PMD is present (Fig. 2d). In both IMR90 and PA-2, these regions showed an enrichment in repressive modifications (H3K27me3 and H3K9me3; median difference 0.025–0.168 reads per kilobase per million (RPKM); Mann–Whitney P < 2.51 × 10−161) and a depletion in active modifications (H3K4me1, H3K27ac and H3K36me3; median difference 0.050–0.012 RPKM; Mann–Whitney P < 2.03 × 10−53) compared to shuffled regions (Fig. 2e, f, Extended Data Fig. 6 d, e and Supplementary Methods), which provides a potential mechanism for their repression. To try to account for this global hypomethylation, we plotted the expression levels of DNMT1, DNMT3A, DNMT3B and DNMT3L but found no systematic expression difference between samples with and without PMDs (Extended Data Fig. 7a–d).

Previous studies have highlighted the existence of methylation outside of the CG context (mCH) in human embryonic stem cells4, brain1,20 and at the promoter of the PGC-1α gene (PPARGC1A) in skeletal muscle27. We found evidence for appreciable amounts of mCH in many of these tissues (Fig. 3a and Extended Data Fig. 8a). A 5-bp motif split the samples into two groups, one with mCH enriched in a TNCAC motif and another with mCH enriched in an NNCAN motif (where N is any base) (Supplementary Methods). The TNCAC motif is highly similar to the one previously identified in purified glia (GLA) and neurons (NRN) (TACAC). These motifs differ from those found in H1 embryonic stem cells (H1) and induced pluripotent stem cells (TACAG)4,26 (Fig. 3b–d). We quantified the extent of mCH across these samples by plotting the distribution of methylation levels at mCH sites in the 25 samples with a TNCAC motif, which revealed a methylation level similar to that of GLA, NRN and H1 (Extended Data Fig. 8b)4,20. Most of the tissue types were consistently enriched for the TNCAC or NNCAN motif, but several (oesophagus, lung, pancreas and spleen) had replicates that disagreed, suggesting that mCH is not homogenously distributed across these tissues.

Figure 3: mCH is prevalent in human tissues. a, The fraction of methylated cytosines in the CH context by sample. b–d, Representative mCH motifs from embryonic (H1; b), tissue (LI-11; c) and brain (NRN; d) samples. The height of each letter represents its information content. e, Heatmap of genic mCAS patterns normalized to the flanking region. Each gene was assigned to 1 of 20 clusters, which is indicated by the number and tick marks on the y axis. The tick marks on the x axis indicate the upstream, transcription start, transcription end, and downstream segments of each gene. The boxes around various patterns highlight regions referenced in the main text. TES, transcription end site. f, Bar plot of the ratio of the genome-wide mCAC to mCAG in various samples. PowerPoint slide Full size image

To examine the potential functional effect of mCH in adult tissues, we plotted the distribution of expression levels for various quantiles of gene body mCH as it was previously reported to be positively correlated with expression in H1 (ref. 4) and negatively correlated with expression in neurons20. This analysis revealed a negative correlation between expression and mCH (Extended Data Fig. 8c and Supplementary Methods). Next, we combined our replicates and clustered genes by the patterns of CAS methylation (in which S is a G or C) in and around their gene body (Fig. 3e and Supplementary Methods). To characterize the genes assigned to each cluster, we performed DAVID functional annotation clustering (Supplementary Table 9 and Supplementary Methods), which revealed several different classes. Clusters 1, 2, 16 and 19 contained genes highly enriched for terms involved in basic cellular processes and had an active methylation state (that is, hypermethylation in embryonic samples and hypomethylation in tissue and brain samples) across all samples. Clusters 5 and 6 were dominated by terms related to neuronal function and genes in this class were differentially methylated between neurons and glia and have inactive methylation states in other samples (that is, hypomethylation in embryonic samples and hypermethylation in tissue and brain samples). Cluster 12 was enriched for heart- and muscle-related terms and its genes had an active methylation state in the three heart tissues as well as a weakly active methylation state in psoas but appeared inactive in other samples. Lastly, cluster 14 possessed an active methylation state in brain and tissue samples but was inactive in embryonic samples. Despite being inactive in the H1 samples, this class of genes was highly enriched for terms related to development.

To define the transition of mCH motifs over development better, we examined the ratio of the methylation level of CAC and CAG (mCAC and mCAG) sites in a variety of differentiated (tissues, NRN and GLA), embryonic (H1), and embryonic-derived (neural progenitor cells (NPC), mesendoderm (MES), trophoblast-like (TRO), mesenchymal stem cells (MSC))28 cell samples (Fig. 3f). With the exception of brain cells, mCH levels drop during differentiation, and the mCAC/mCAG ratios revealed a shift in motif usage across developmental time (Fig. 3f); although, mCAC and mCAG within the same gene remain tightly correlated in both early embryonic and differentiated tissues (Extended Data Fig. 8d, e).

Methylation has previously been shown to be predictive of genes escaping X-chromosome inactivation in neurons20. We investigated this phenomenon in these samples by comparing the promoter mCG and gene body mCH of genes that had previously been identified to escape X-chromosome inactivation29 in 11 tissues with mCH (Fig. 4a). Female-specific promoter mCG hypomethylation and gene body mCH hypermethylation were present at escapee genes at a similar level as in neurons20 (Extended Data Fig. 9a). Using these tissue methylomes, gene body mCH was appreciably predictive of biallecially expressed genes (AUC 0.89; Extended Data Fig. 9b and Supplementary Methods). To a lesser extent, we observed female-specific promoter mCH and gene body mCG hypermethylation at escapee genes (Extended Data Fig. 9a, c, d). Although female-specific promoter mCG hypomethylation, promoter mCH hypermethylation and gene body mCG hypermethylation are predictive of X-chromosome inactivation escapees, female-specific gene body mCH hypermethylation is the most predictive feature of X-chromosome inactivation escapees (Extended Data Fig. 9a, b–e). We detected female-specific mCH hypermethylation in 109 out of 612 X-linked genes, including 9 genes hypermethylated in all 11 tissues and 72 genes that were hypermethylated in only one tissue (Fig. 4b). Several genes such as FUNDC1 showed female-specific hypermethylation in several tissues but not in neurons, suggesting a tissue-dependent regulation of the escape from X inactivation.

Figure 4: Allele-specific methylation and expression. a, Browser screenshot of the increase in female mCH for a gene known to escape X-chromosome inactivation (MED14). Sample names are coloured by gender (male, black; female, red). MED14-AS1 is also known as MED14OS. b, Ratio of mCH level in female versus male samples across genes with a significant difference in at least one sample. Cells boxed in black denote samples with a statistically significant difference between females and males. The XCI score for each gene is from ref. 29 and indicates the degree of escaping X-chromosome inactivation. c, The number of ASM and ASE sites across the triplicated tissues. The top row depicts ASM events (left) and ASE events (right) that are allele-specific in all tissues (black), are variable across tissues (white), or do not possess enough data to tell (grey). The bottom row depicts the distribution of variable sites from the top row that vary by individual (blue), tissue (red) or neither (purple). PowerPoint slide Full size image

Allele-specific methylation and expression (ASM and ASE, respectively) may also have a role in the regulation of autosomal genes. To examine these phenomena in human tissues, we combined the RNA-seq and MethylC-seq data sets with phased genotypes for each individual in this study3,15 (Extended Data Fig. 10a and Supplementary Methods). Using the triplicate tissue samples (fat (FT), gastric (GA), psoas (PO), small bowel (SB) and spleen (SX)), we identified 8,464–48,560 ASM events in the CG context and 48–403 ASE genes across these tissues (Supplementary Tables 10, 11 and Supplementary Methods). We next looked for ASM events that varied across individuals within a tissue-type (tissue variable) and those that varied across a tissue-type within an individual (individual variable). Of the ASM events that varied, 4.1–7.5% and 54.5–70.0% were tissue- and individual-variable, respectively; whereas, of the ASE events that varied, 0.0–20.0% were individual-variable and 13.3–48.8% were tissue-variable (Fig. 4c and Supplementary Methods). Of the ASE events, 38.4–87.4% had an ASM event within 100 kb, and of these sites, 76% had an ASM and ASE event that was matched (that is, a DMR was hypomethylated on the same haplotype as the more highly expressed allele). Furthermore, we found that a larger fraction of ASE genes were observed near ASM events whether or not the events matched (Extended Data Fig. 10 b, c and Supplementary Methods). These results demonstrate a link between ASM and ASE in human tissues.

Here we have presented the deepest set of base resolution maps of mCG and mCH so far along with chromatin modification states, haplotype-resolved genome sequences and transcriptional profiles for a large set of human tissues. These data sets allowed us to identify cis-regulatory elements accurately. Furthermore, they revealed the existence of mCH genome-wide in a subpopulation of cells from differentiated human tissues, which seems to be repressive. Our analysis of genic mCH across human tissues indicates a tissue-specific distribution that is distinct from those genes that were previously identified in embryonic stem cells and the brain. These genes are enriched for a variety of functions, most surprisingly those involved in development. These analyses raise the intriguing possibility that mCH is used in adult stem cells30 and could help to repress these genes as the cells transition into their differentiated role.