The hotspots of structural polymorphisms and structural mutability in the human genome remain to be explained mechanistically. We examine associations of structural mutability with germline DNA methylation and with non-allelic homologous recombination (NAHR) mediated by low-copy repeats (LCRs). Combined evidence from four human sperm methylome maps, human genome evolution, structural polymorphisms in the human population, and previous genomic and disease studies consistently points to a strong association of germline hypomethylation and genomic instability. Specifically, methylation deserts, the ∼1% fraction of the human genome with the lowest methylation in the germline, show a tenfold enrichment for structural rearrangements that occurred in the human genome since the branching of chimpanzee and are highly enriched for fast-evolving loci that regulate tissue-specific gene expression. Analysis of copy number variants (CNVs) from 400 human samples identified using a custom-designed array comparative genomic hybridization (aCGH) chip, combined with publicly available structural variation data, indicates that association of structural mutability with germline hypomethylation is comparable in magnitude to the association of structural mutability with LCR–mediated NAHR. Moreover, rare CNVs occurring in the genomes of individuals diagnosed with schizophrenia, bipolar disorder, and developmental delay and de novo CNVs occurring in those diagnosed with autism are significantly more concentrated within hypomethylated regions. These findings suggest a new connection between the epigenome, selective mutability, evolution, and human disease.

The human genome contains many loci with high incidence of structural mutations, including insertions and deletions of chromosomal segments. This excessive mutability has accelerated evolution and contributed to human disease but has yet to be explained. Segments of DNA repeated in low-copy numbers (LCRs) have been previously implicated in promoting structural mutability in specific disease-associated loci. Lack of methylation (hypomethylation) of genomic DNA has been previously associated with high structural mutability in gibbons and in human cancer cells, but the association with structural mutability in the human germline has not been explored prior to this study. Our analyses confirm the role of LCRs in promoting structural mutability on the genome scale but also reveal a surprisingly strong association of genomic instability with hypomethylation. Specifically, evolutionary analyses reveal that methylation deserts, the ∼1% fraction of the human genome with the lowest methylation in human sperm, harbor a tenfold higher number of structural mutations than genome-wide average. Moreover, the structural mutations in individuals diagnosed with schizophrenia, bipolar disorder, developmental delay, and autism are significantly more concentrated within hypomethylated regions. Our findings suggest a new connection between methylation of genomic DNA, selective structural mutability, evolution, and human disease.

Funding: This research has been funded by the NIH/NIDA NIH Roadmap Epigenomics Project grant U01 DA025956 and the NIH/NHGRI grant R01 HG004009. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2012 Li et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Genomic hypomethylation and LCR-mediated NAHR are therefore the two genome architectural features shown to be associated with structural changes. We here systematically examine and quantitate these associations. To assess the degree of association of germline methylation levels with structural instability, we examine four sperm methylome maps, including two high read coverage (15× combined coverage) from a recent study [35] and two maps we obtained by performing whole-genome bisulfite sequencing of sperm samples from two anonymous donors at low coverage (2.5× combined coverage). To improve detection of structural mutations associated with LCRs and NAHR, we perform a comprehensive detection of human LCRs in the human genome and design an aCGH array for diagnostic use in the BCM Medical Genetics Laboratories (BCM-MGL) targeting NAHR susceptible regions between directly oriented paralogous LCRs (DP-LCRs) with size larger than 10 Kbp, separated by a distance less than 10 Mb of unique genomic sequence. We combine evidence of structural mutations from the following three sources: 1) human-specific genomic rearrangements; 2) structural polymorphisms in the human population, including copy-number variation (CNV) data from BCM-MGL and publicly available CNV data sets [36] , [37] , [38] ; and 3) recent disease studies of schizophrenia [39] , bipolar disorder [40] , developmental delay [41] , and autism [42] . Our analyses reveal a pattern of association of structural mutability with germline hypomethylation comparable in magnitude to the association between structural mutability and LCR-mediated NAHR.

Multiple independent lines of evidence point to a possible role of the epigenome in structural mutability. Chromatin modifications are known to play a significant role in chromosome maintenance [19] , including DNA repair [20] , [21] , and recombination [22] , [23] . Chromatin and the epigenome regulate mutability at smaller scales, including increased mutability of 5-methyl cytosine [24] , retroposon silencing [25] , [26] , [27] , and preferential retrotransposition into specific chromatin states [28] . Genome-wide hypomethylation has been repeatedly observed in structurally unstable cancer genomes [29] , [30] . Mutations in the methyltransferase DNMT3B have been shown to cause hypomethylation and genomic instability in juxtacentromeric regions in humans [31] . Mutations in the mouse homolog of methyltransferase DNMT1 have been shown to cause genomic instability [32] . Analyses of the structurally hypermutable genomes of gibbon species revealed association of hypomethylation with structurally mutable loci [33] . Finally, the recent discovery of the role of the DNA-break inducing base-excision repair pathway in genomic demethylation of primordial germ cells (PGCs) during fetal development in mouse [34] provides a possible mechanistic link between genomic hypomethylation and genomic instability in the mammalian germline.

Recent high-resolution genome analyses of genomic disorder loci revealed complex patterns of rearrangements not consistent with the NAHR mechanism [14] , [15] , [16] , [17] . The mechanisms causing mutability in such structurally mutable hotspots remain elusive. Microhomologies and other sequence-level features point to the role of Fork Stalling and Template Switching (FoSTeS) and Microhomology-Mediated Break-Induced Replication (MMBIR) mechanisms [16] in the processing and repair of one-ended, double-stranded DNA [18] . However, these are repair mechanisms, are not causing mutations, and have not explained the highly selective distribution of structural mutability nor predicted genomically unstable loci.

The distribution of structural mutations in the human genome is highly selective, characterized by many hotspots of structural mutability. Evolutionary analyses of recent structural mutations in the human genome reveal that structural mutation hotspots frequently give rise to new LCRs [10] , [11] , indicating that a significant fraction of the observed association of LCRs and mutability may be explained by the increased production of LCRs at hypermutable loci. The recent discovery of a genome-wide association of LCRs with somatic mutability in cancer [12] , and structural breakpoints in the mouse genome independent of LCR homology [13] further support the hypothesis that LCRs may not always cause instability but may preferentially arise at the loci that are inherently mutable both in cancer and in germline.

The process of chromothripsis [8] has been proposed as a model to explain instability in 1–3% of all cancers resulting in a highly complex pattern of genomic rearrangements with multiple CNVs. The patterns of genomic instability observed in cancer have also been observed in complex genomic rearrangements (CGR) in human germline, pointing to similar mechanistic underpinnings [9] .

Array comparative genomic hybridization (aCGH) studies [1] and massively parallel sequencing [2] revealed that approximately 10% of the human genome is structurally polymorphic at the submicroscopic scale (<4 Mb), a much larger fraction than affected by single nucleotide polymorphisms (SNPs). Structural mutations that occur in a number of well studied structurally unstable loci cause disease [3] . The discovery of these structurally mutable disease-associated loci gave rise to the concept of genomic disorders [3] , [4] . Their detailed analysis revealed the role of non-allelic homologous recombination (NAHR) and low copy repeats (LCR) in mediating recurrent deletions, duplications and inversions [5] . Genome-wide analyses of regions between paralogous LCRs in direct orientation have since led to the successful prediction of novel LCR-mediated genomic disorders [6] , reinforcing the role of NAHR and LCRs. A potential role for LCR in inverted orientation has been elucidated recently for a specific type of complex duplication with an embedded triplicated segment in inverse orientation, DUP-TRP/INV-DUP [7] .

Results

Construction and Comparative Analysis of Sperm Methylomes by Whole-Genome Bisulfite Sequencing To examine a potential association between germline methylation and structural mutability in humans, we first derived two sperm methylome maps by sequencing at combined 2.5× genome coverage (one at 1.2× and the other at 1.3×) bisulfite-treated genomic DNA samples extracted from the sperm of two anonymous donors. Methylation levels were calculated for each of the 28,705 non-overlapping 100 Kbp windows covering the hg18 human genome assembly as the ratio between the number of methylated CpGs and the total number of CpGs sampled in reads mapping within the window. Windows with less than 20 CpG sampling events were removed from the subsequent analysis to avoid bias due to low sequence mappability. Both samples had more than 95% of windows with reads covering more than 40% of the CpGs within the window (Figure S7B). Due to the low 2.5× combined coverage, the methylation levels of individual CpGs could not be determined with accuracy, but the average methylation levels at 100 Kbp level of resolution could be determined with high accuracy. Specifically, the methylation level of >98% windows was determined with <10% error with >95% probability (Table S10). The two methylomes were highly concordant at 100 Kbp level of resolution (linear correlation coefficient = 0.96). For the purpose of our analyses, an average sperm methylome at 2.5× coverage was constructed as an average of the two concordant methylomes. Methylation deserts were operationally defined as the 100 Kbp windows with the lowest 1% methylation level in the average sperm methylome. A 5% threshold was also used for some analyses, as noted below. We repeated our analyses using an independently obtained pair of sperm methylomes generated by Molaro et al. [35] from bisulfite sequencing data at a combined 15× genome coverage. To ensure deep sampling of CpGs in each window, only windows with more than 100 mapped reads and more than 100 CpG sampling events at 15× coverage were included in the subsequent analyses. To facilitate comparison, both combined methylomes (at 2.5× coverage and at 15× coverage) were represented as methylation averages across the same set of 100 Kbp windows tiling the human genome. The 15× methylome showed high correlation with the 2.5× methylome at the 100 Kbp resolution (r = 0.82, p-value<2.2e-16). Methylation deserts discovered at 2.5× coverage using methylation percentile rank thresholds of 1% and 5% significantly overlapped those discovered at 15× coverage (Figure S21), indicating relatively stable genomic localization of methylation deserts across individuals.

Estimation of Germline Methylation Levels Using a Methylation Index Calculation Methylation levels in sperm are only a partial indicator of methylation levels in the whole human germline. To further examine the association between germline methylation and structural mutability in humans directly, one would ideally be able to measure DNA methylation in the entire male and female germline lineages, which are highly dimorphic [47]. To practically address this issue, we pursued an indirect approach by estimating methylation levels in the human germline (an average of male and female germlines), using the methylation index (MI) model [48] (Materials and Methods: Methylation Index Calculation at 100 Kbp Level of Resolution). Approximately 20% of the methylation deserts (defined as the lowest 1% methylation levels in sperm) occur within the 1.5% fraction of windows with the lowest MI score (MI = 0), an indication that methylation deserts detected in sperm overlap substantially with hypomethylation in the germline as a whole (Figure S6A). The windows with MI = 0 contain ∼15% of the human-specific structural rearrangements, a similar tenfold enrichment as we observed for methylation deserts defined based on the sperm methylomes (Figure 1A). The sperm methylation scores of windows with MI = 0 show a bimodal distribution (Figure S6B), the lower mode including 35% with low methylation levels (<5%) in sperm and the higher mode is comprised of the remaining 65% that appear to have normal methylation levels in sperm. Because the higher mode could not be explained by obvious ascertainment biases (Materials and Methods: Examination of MI Ascertainment Biases), we hypothesize that this mode may either indicate hypomethylation specific to the female germline, given that male and female germline methylation patterns are highly dimorphic [47], or may be due to other germline hypomethylation detected by MI that is absent from sperm. Similar bimodal distribution was observed at 15× coverage (Figure S9B). As additional controls, five publicly available methylomes obtained by whole-genome bisulfite sequencing [49], [50] of human stem cells and fibroblasts were also compared over the same set of 100 Kbp windows. Methylation levels in sperm showed much higher correlations with the methylation levels in embryonic stem cells than with fibroblasts (Table S2), consistent with the more differentiated state of fibroblasts. Importantly, the methylation levels in sperm samples have higher correlations with the germline MI scores than either stem cells or fibroblasts (Table S2). Moreover, the bimodal distribution of hypomethylated regions is unique to sperm (Figure S9), consistent with sperm being the closest representative of the human germline.

Publicly Available CNV Data Validate Association between Hypomethylation and Structural Mutability As an independent test for any potential association between hypomethylation and structural mutability, we performed analyses analogous to those discussed in the previous section using the following three publicly available CNV datasets: (i) aCGH data obtained from 270 HapMap samples using high-resolution Affymetrix SNP 6.0 arrays [36]; (ii) aCGH data obtained from 450 HapMap samples using tiling oligonucleotide microarrays [37]; and (iii) CNV data generated on 19,000 samples [38] in a study of the role of common CNVs in eight common human diseases. The dataset (i) complements the 400-sample BCM-MGL data because it detects CNVs that overlap LCRs, and it provides high probe resolution in regions that are not associated with LCRs. Despite the bias away from known polymorphisms in the design of the custom array used to generate the 400-sample BCM-MGL dataset (Materials and Methods: aCGH Probe Set Design and Analysis of CNVs in 400 MGL Samples, Text S1 section 5 and Figure S12), analyses of the data set (i) confirmed the relative strengths of association of structural mutability with NAHR and with hypomethylation identified using the BCM-MGL data, as indicated in Figures 2B, 3, 4, Figure S14, and Table S7. All three (i–iii) datasets confirmed significantly higher average heterozygosity rates of CNVs in methylation deserts (Figure 4). However, dataset (iii), which was biased against rare structural alleles [38], showed no significant difference in overall heterozygosity rate distributions between CNVs in the methylation deserts and the rest of the CNVs (Figure S14D), suggesting that rare variants may account for a significant fraction of association. In summary, despite the differences in array technologies, array design biases, and sample sets applied to the arrays, our analyses repeatedly point to a significant association of hypomethylation and structural mutability.

Analysis of Methylomes in Germline and Embryonic Stem Cells Indicates Association of Structural Mutability with Germline-Specific Hypomethylation We next asked if the association between structural mutability and hypomethylation is specific to germline, using the embryonic stem cell line H1 methylome [50] as a control. Germline methylation was assessed using the sperm methylomes both independently and in combination with the methylation index, as summarized in the five columns in Table 1. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Table 1. Enrichment of structural mutability in hypomethylated regions determined by the germline methylation index (MI = 0) and whole-genome bisulfite sequencing of human sperm DNA (at 2.5× and 15×). https://doi.org/10.1371/journal.pgen.1002692.t001 Recall that for windows with MI = 0, the sperm methylation scores showed a bimodal distribution (Figure S6B). As indicated in Table 1, significant enrichment of structural mutability could be observed for windows with MI = 0, and for both lower and higher modes of these windows. The enrichment observed in the higher mode (Table 1, column “MI = 0 & sperm>5%”) suggests the role of hypomethylation that is possibly present in the female germline and captured using the MI measurement but not present in sperm. The windows containing rearrangement/variation showed much lower methylation levels in the sperm methylome (Figure S15A–S15C). In contrast, an association with methylation levels in H1 could not be detected for the CNVs, except that windows containing human-specific evolutionary rearrangements did show association (Figure S15D–S15F). We found significant negative correlation between the methylation scores in sperm and the heterozygosity rates (CNVs from 400 MGL samples: r≈−0.15, p≈10−9; CNVs from 270 HapMap samples: r≈−0.20, p≈10−10). In contrast, no significant correlation between the H1 methylation scores and the CNV heterozygosity rates was detected. We next examined the difference in methylation levels between sperm and H1. As illustrated in Figure S16, the difference shows even stronger association with structural mutability than the absolute methylation levels in sperm. This result rules out possible ascertainment biases due to low mappability of sequencing reads in potentially unstable and repetitive hypomethylated regions. It also suggests that structural mutability is associated with germline-specific hypomethylation.

Structural Variants Identified Specifically in Schizophrenia Patients Concentrate within Hypomethylated Regions We next examined the distribution of rare CNVs detected in the recent large-scale study by the International Schizophrenia Consortium [39]. CNVs in 3,391 individuals diagnosed with schizophrenia and 3,181 controls were identified and analyzed using Affymetrix SNP arrays. The study found that the individuals in the affected group have 15% more rare variants. We asked if the excess of variants in the affected group tends to occur in regions with low germline methylation levels. We first compared the distribution of the methylation levels for 100 Kbp windows containing the CNVs in the affected group with the distribution of methylation levels for windows not containing any CNVs. The same procedure was performed for the CNVs in the control group. Both the affected and control CNVs showed lower methylation. A significant enrichment of low MI values (Kolmogorov-Smirnov test, p≈10−5) was found for the affected group (Table S3), while no significant enrichment was found for the control group. We next identified those CNVs found only in the affected group and those found only in the control group. The two subsets were then further classified as being within or outside of regions showing lowest 5% methylation levels in sperm. The chi-square test indicates a 3-fold enrichment (p≈10−3) within low methylation regions of variants identified only in the affected group compared to those found only in the control group (Table 1). Similar enrichment was found in regions with MI = 0 (Table 1).

Large Deletions Identified Specifically in Bipolar Disorder Patients Concentrate within Hypomethylated Regions We next examined distribution of CNVs identified in a recent bipolar disease study [40]. The study identified CNVs in 1001 bipolar disease cases and 1034 controls. An excess of large singleton deletions was found in cases relative to controls. We examined methylation of singleton deletions found only in bipolar cases to the methylation of the deletions found only in controls. As indicated in Table 1, compared to control-specific deletions the case-specific singleton deletions were enriched over 2-fold (p<1e-3 by Chi-square test) within the 100 Kbp windows having lowest 5% methylation levels in sperm.

De Novo Structural Variants in Autism Cases Are Concentrated within Hypomethylated Regions A recent autism spectrum disorders (ASDs) study [42] found a higher burden of rare CNVs in ASD patients. Trio analyses established that some of the CNVs were not present in parental genomes and were classified as de novo. We asked if the rare and de novo CNVs detected in the autism cases and controls associated with low methylation levels. The regions containing rare CNVs in both the cases and controls showed significant enrichment for both low methylation levels in sperm and for low MI values, when compared with regions without any rare CNVs (Table S3). The CNV variants identified only in the cases showed an approximately two-fold enrichment in hypomethylated regions compared to those found only in controls, but the enrichment did not reach statistical significance threshold due to a small number of variants detected (data not shown). Analysis of de novo and inherited CNVs found in cases revealed highly significant enrichment within hypomethylated regions of de novo relative to inherited CNVs. The enrichment was observed within hypomethylated regions in sperm (<5%), within windows of MI = 0, and especially in regions that met both criteria (Table 1).