Chromatin regulation is critical for differentiation and disease. However, features linking the chromatin environment of stem cells with disease remain largely unknown. We explored chromatin accessibility in embryonic and multipotent stem cells and unexpectedly identified widespread chromatin accessibility at repetitive elements. Integrating genomic and biochemical approaches, we demonstrate that these sites of increased accessibility are associated with well-positioned nucleosomes marked by distinct histone modifications. Differentiation is accompanied by chromatin remodeling at repetitive elements associated with altered expression of genes in relevant developmental pathways. Remarkably, we found that the chromatin environment of Ewing sarcoma, a mesenchymally derived tumor, is shared with primary mesenchymal stem cells (MSCs). Accessibility at repetitive elements in MSCs offers a permissive environment that is exploited by the critical oncogene responsible for this cancer. Our data demonstrate that stem cells harbor a unique chromatin landscape characterized by accessibility at repetitive elements, a feature associated with differentiation and oncogenesis.

In an effort to comprehensively explore features of chromatin organization that accompany early mesenchymal differentiation and a potential association with Ewing sarcoma, we utilized formaldehyde-assisted isolation of regulatory element sequencing (FAIRE-seq), an unbiased biochemical assay that enriches for localized regions of nucleosome-depleted (“open”) chromatin (). Regions identified by FAIRE-seq include a broad range of regulatory classes. We applied this technique to compare the chromatin landscape of hESCs, primary and in-vitro-differentiated mesenchymal stem cells, and mature cell lines. We identified increased chromatin accessibility at specific classes of repetitive elements in stem cells. These regions harbored distinct histone modifications and underwent chromatin remodeling during differentiation. A subset of repetitive elements exhibiting enhanced chromatin accessibility in stem cells offered a permissive environment that could be exploited by EWSR1-FLI1 in Ewing sarcoma lending support of a stem cell origin for this cancer and offering a mechanistic explanation for its selective targeting.

Ewing sarcoma is a highly malignant tumor of the bone and soft tissue with a peak incidence during adolescence. This tumor is virtually always characterized by a recurrent chromosomal rearrangement that brings together the amino terminus of EWSR1 with the carboxyl DNA binding domain of the ETS family transcription factor FLI1. We and others have shown that the chimeric oncoprotein is selectively targeted away from canonical ETS sites to coopt microsatellite repeats that contain the core recognition element sequence (). At these sites, EWSR1-FLI1 is necessary to maintain a fully accessible chromatin landscape marked by enhancer associated histone modifications (). Many of the genes implicated in tumor development and regulated by EWSR1-FLI1 are located proximally to these microsatellite repeats (). Despite its chromatin remodeling activity, EWSR1-FLI1 only demonstrates cancer-like targeting in Ewing sarcoma cells. What mediates the selective targeting of EWSR1-FLI1 and what this indicates about the cell-of-origin remain unknown.

NR0B1 is required for the oncogenic phenotype mediated by EWS/FLI in Ewing’s sarcoma.

Tumor-specific retargeting of an oncogenic transcription factor chimera results in dysregulation of chromatin and transcription.

Tumor-specific retargeting of an oncogenic transcription factor chimera results in dysregulation of chromatin and transcription.

Early mammalian development necessitates precisely regulated transcriptomic and chromatin changes as cells commit to their terminal fates (). A comprehensive understanding of chromatin remodeling during differentiation may reveal biological pathways that regulate this process and could suggest therapeutic opportunities relevant to cancer-directed and regenerative medicine. Human embryonic stem cells (hESCs), derived from the inner cell mass of human blastocysts, can propagate in vitro and are able to undergo multi-lineage differentiation (). Previous studies have explored chromatin dynamics during stem cell differentiation by comparing hESCs to differentiated cells. hESCs are characterized by elevated levels of activation-associated histone post-translational modifications, histone bivalency at developmentally regulated genes, and increased expression of variant histones (). Though insightful, histone modification changes represent one of multiple strategies that ultimately regulate the chromatin landscape.

Expression of genes coding for histone variants and histone-associated proteins in pluripotent stem cells and mouse preimplantation embryos.

Since EWSR1-FLI1 targets a subset of GGAA-containing simple repeats, we asked whether there were other chromatin features that correlated with increased FAIRE signal in BM-MSCs and the ability to bind EWSR1-FLI1. Using ChIP from H1-MSCs (), we examined histone modifications at those sites that are targeted by EWSR1-FLI1 in cancer cells. Of histone modifications available for analysis, we noted a subtle increase in enrichment for H3K14ac, H4K91ac, H2BK12ac, all marks enriched in simple repeats in hESCs ( Figure 6 E). These data suggest that chromatin modifications at critical sites specific to stem cells facilitate EWSR1-FLI1 targeting.

We then explored chromatin accessibility using enzymatic approaches. Neither DNase-seq data that we generated from BM-MSCs nor published DNase and ATAC data from these cells identified signal enrichment at regions ultimately targeted by EWSR1-FLI1 ( Figure 6 D;). The absence of signal is consistent with our result in hESCs ( Figure 3 A). In contrast, in Ewing sarcoma cells these regions were detectable by DNase and ATAC. Moreover, in BM-MSCs, ATAC enrichment was noted at these sites only after EWSR1-FLI1 was transduced ( Figure 6 D;). Neither DNase nor ATAC signal enrichment was observed at similar repeats that did not bind EWSR1-FLI1. These data suggest that EWSR1-FLI1 targets nucleosome-destabilized regions, which ultimately leads to nucleosome eviction.

Tumor-specific retargeting of an oncogenic transcription factor chimera results in dysregulation of chromatin and transcription.

To explore the connection between chromatin accessibility and EWSR1-FLI1 targeting, we compared FAIRE signal in BM-MSCs with EWSR1-FLI1 ChIP signal from Ewing sarcoma cells. Repeats with the greatest FAIRE signal in BM-MSCs demonstrated the greatest ChIP signal in the tumor cells ( Figure S6 D). Similarly, EWSR1-FLI1 targeted those regions for which the maximal FAIRE signal was over the repeat in BM-MSCs (p < 0.001, permutation). We then explored the activity of EWSR1-FLI1 on chromatin. We compared the difference in FAIRE signal between BM-MSCs and the tumor cells with EWSR1-FLI1 ChIP signal. We found a significant correlation between oncoprotein binding and changes in FAIRE signal (r = 0.74) ( Figure 6 C). Taken together, these data lend chromatin-based evidence of an MSC origin for these tumors and, further, demonstrate that characteristics of chromatin MSCs predict EWSR1-FLI1 oncoprotein targeting in tumor cells.

We first tested for the enrichment of repeat classes in accessible chromatin in tumor cells and primary BM-MSCs and found that they shared a high degree of enrichment at simple repeats, relative to other repetitive element classes ( Figure 6 A). Since EWSR1-FLI1 selectively retargets GGAA-containing simple repeats, we then examined FAIRE signal in BM-MSCs and Ewing sarcoma cells at all simple repeats containing this motif, clustering these regions based on their signal in the cancer cells ( Figure 6 B). We observed a striking similarity in the pattern of chromatin accessibility between the stem and cancer cells. In BM-MSCs, the signal was center weighted at about half of the regions ( Figures 6 B and S6 C). For others, regions flanking the repeat demonstrated the greatest signal.

(E) Fold change of H1-MSC ChIP signal for H3K14ac, H4K91ac, H2BK12ac, and H3K27ac at repeats bound by EWSR1-FLI1 in Ewing sarcoma cells relative to a equal number of randomly selected repeats that were not bound. Distance represents from center of repeat.

(D) FAIRE, DNase, and ATAC signal at EWSR1-FLI1 binding sites in BM-MSCs, Ewing sarcoma (EWS), and MSCs exogenously expressing EWSR1-FLI1 (). Distance represents kilobases from the center of the repeat. FAIRE and DNase data were normalized to overall read count.ATAC read count was unavailable and consequently not normalized.

(C) Scatterplot of log2-transformed FAIRE change between BM-MSCs and EWS and EWSR1-FLI1 ChIP signal at EWSR1-FLI1 bound (red) or unbound (blue) repeats. Pearson correlation is shown.

(B) Clustered BM-MSCs or EWS FAIRE signal at all (GGAA) n -containing simple repeats (left). EWSR1-FLI1 ChIP signal in EWS at (GGAA) n -containing simple repeats (right).

(A) Heatmap depicting the enrichment of specific classes of repetitive elements in MACS2-identified FAIRE-enriched regions in Ewing sarcoma (EWS) and BM-MSC chromatin, relative to genomic coverage.

Many sarcomas are thought to originate from stem cells of mesenchymal origin (). To explore this link, we compared the chromatin environment in stem cells with that in Ewing sarcoma, the second most common bone malignancy in children and young adults. Ewing sarcoma is characterized by a chromosomal rearrangement that creates a chimeric transcription factor. We and others have previously shown that the resulting oncoprotein, EWSR1-FLI1, targets a subset of simple repeats distinct from the parental protein FLI1 (). The binding of this transcription factor activates an oncogenic transcriptional profile critical for maintaining tumorigenicity (). However, this targeting is cell type specific (). This observation led us to hypothesize that a permissive chromatin environment enables EWSR1-FLI1 retargeting.

Tumor-specific retargeting of an oncogenic transcription factor chimera results in dysregulation of chromatin and transcription.

Taken together, these results suggest that repetitive elements undergo chromatin remodeling during differentiation. Repetitive regions with variable accessibility are associated with changes in lineage-specific gene expression and developmental pathways. However, the pattern of remodeling differs between the two classes of repetitive elements. SINEs primarily become inaccessible during differentiation, whereas a subset of simple repeats becomes accessible during lineage specification.

Clustering cell lineages based on FAIRE signal at simple repeats demonstrated a distinct pattern from that observed based on signal at SINEs. hESCs clustered together, clearly separated from the differentiated cells. MSCs, including bone marrow and H1 derived, clustered closely together but grouped with the differentiated cells. FAIRE signal at simple repeats revealed two patterns. One pattern was similar to that seen for the SINEs with progressively decreasing signal associated with differentiation. The other pattern revealed greater FAIRE signal in MSC lineages compared with either hESCs or the differentiated cells ( Figure 5 E). We again associated these regions with differentially regulated genes. Regions with higher FAIRE signal in hESCs or MSCs were significantly associated with genes more highly expressed in hESCs or MSCs, respectively, when compared to a permutation (p = 0.02 and p < 0.001, respectively) ( Figure 5 F). Similar to SINEs, hESC-associated genes were again enriched for gene ontologies related ESCs, whereas MSC-associated genes were enriched for pathways linked to mesenchymal development ( Figure 5 E; Table S4 ).

FAIRE-seq was then performed on primary BM-MSCs and H1-MSCs. We identified ∼15,000 SINEs and ∼4,500 simple repeats with significantly different FAIRE signal between hESCs and BM-MSCs. Unsupervised hierarchical clustering of these regions, together with signal from H1-MSCs as well as differentiated control cells, revealed two main clusters ( Figures 5 A and 5E ). For SINEs, stem cells clustered together closely distinct from the differentiated cells. Notably, the differentiated H1-MSCs exhibited greater similarity to primary BM-MSCs than to undifferentiated H1-ESC. Overall, virtually all sites that exhibit signal variation demonstrated a progressive decrease in FAIRE enrichment accompanying differentiation. SINEs with differential FAIRE signal were then associated with the most proximal gene. Of those genes that were also differentially expressed, 95% demonstrate greater expression (>2-fold) in hESCs relative to BM-MSCs, significant when compared to a permutation (p < 0.001) ( Figure 5 B). The overall skew to greater message abundance in hESCs is consistent with higher global transcription levels in these cells (). Genes with elevated expression in hESCs (category 1) were linked to curated ontologies related to ESC-specific expression, whereas those with elevated expression in MSCs (category 2) were linked with terms implicating mesenchymal development such as wound repair and adipogenesis ( Figures 5 E; Table S4 ).

(E) Gene ontologies enriched for genes identified in category 1 and 2 from (B) and category 1 and 4 from (D). Ontologies were organized by ESC-like (green), MSC-like (red), cancer-like (blue), and shared (black). p value intensity determined by shade of each color. Also see Table S4

(D) Fraction of genes linked to variable FAIRE at repeats that demonstrate differential gene expression. Genes associated with increased FAIRE in hESCs are further divided, those associated with genes with increased expression in hESCs (gray, category 1) and those associated with genes with increased expression in MSCs (black, category 2). Genes associated with increased FAIRE in MSCs are also further divided, those associated with genes with increased expression in hESCs (gray, category 3) and those associated with genes with increased expression in MSCs (black, category 4). Differential expression was defined as RPKM >2-fold change. Control represents the average from 1,000 permutations performed with equal number of selected repeats with permuted p values. Error bars represent SD of the permutations.

(C) FAIRE signal from hESCs, BM-MSCs, H1-MSCs, K562, NHEK, and HUVEC at Simple Repeats characterized by significantly different FAIRE signal between hESCs and BM-MSCs (t test p < 0.01, row max – row min >1) were Z score transformed and biclustered. Heatmap scale represents relative Z scores.

(B) Fraction of genes linked to variable FAIRE at repeats that demonstrate differential gene expression (increased in hESCs, gray, category 1; increased in MSCs, black, category 2). Differential expression was defined as RPKM >2-fold change. Control represents the average from 1,000 permutations performed with equal number of selected repeats with permuted p values. Error bars represent SD of the permutations.

(A) FAIRE signal from hESCs, BM-MSC, H1-MSC, K562, NHEK, and HUVEC at SINEs characterized by significantly different FAIRE signal between hESCs and BM-MSCs (t test p < 0.01, row max – row min >1) were Z score transformed and clustered. Heatmap scale represents relative Z scores.

The difference in FAIRE enrichment at repetitive elements in stem and differentiated cells led us to test whether these elements undergo remodeling during differentiation. H1-ESC embryonic stem cells were differentiated in culture toward a mesenchymal lineage (H1-MSC). Differentiation of hESCs to MSCs was validated using several approaches. Morphologically, H1-MSCs acquired a fibroblastic appearance in contrast to the spherical colonies of H1-ESC ( Figure S6 A). The multipotency of H1-MSC was demonstrated by further differentiation into osteoblast and adipocyte lineages ( Figure S6 A). Finally, flow cytometry of H1-MSC identified a robust increase in CD90, CD73, CD105, and CD44, and cell-surface markers were also detected on primary bone-marrow-derived MSCs (BM-MSCs) ( Figure S6 B). H1-ESCs were negative for CD73 and CD105.

To identify nucleosome positioning at repetitive elements in highly accessible chromatin, we sequenced the DNA in both low- and high-salt-soluble fractions and plotted the signal at simple repeats ( Figure 4 E). We again identified nucleosome phasing flanking the repeats in both H1-ESC and the differentiated control cells. However, compared with the differentiated cell control, H1-ESC demonstrated an increase in MNase signal at the center of the repeat exclusively in low-salt-extracted chromatin, indicative of a highly extractable nucleosome. Taken together with the immunoblot and ChIP-seq analysis, these data indicate that specific acetylation is associated with nucleosomal destabilization but not displacement at repetitive elements.

As an alternative approach to explore chromatin accessibility, we performed salt fractionation of MNase-treated nuclei. Salt fractionation separates chromatin based on physical properties (). Low-salt-soluble regions are enriched for active and highly accessible chromatin, whereas high salt solubilizes the bulk chromatin fraction (). Salt fractionation both allows for direct comparisons of nucleosome composition in active chromatin as well as the positioning of individual nucleosomes by high-throughput sequencing. Nucleosomes were extracted from nuclei of H1-ESC and a differentiated control (human kidney cells [HKCs]) using increasing concentrations of salt. The low-salt fraction from both cell types consisted predominantly of mono-nucleosomes, whereas the high-salt fraction consisted of mostly di-nucleosomes ( Figure S5 ), consistent with published results (). Histone post-translational modifications associated with each fraction were assayed by immunoblot. As predicted by our informatic analyses, H2AK5ac was significantly enriched in low-salt fractions of nucleosomes from stem cells when compared to the differentiated control cells (p value <0.05, Figure 4 D). This enrichment did not extend to the high salt or insoluble chromatin.

We then asked whether specific histone modifications distinguish the nucleosomes at accessible repetitive elements. We compared H1-ESC chromatin immunoprecipitation sequencing (ChIP-seq) data for a range of histone modifications at FAIRE-enriched and FAIRE-negative sites (). We found that FAIRE-enriched simple repeats were marked by specific acetylated histones ( Figure 4 A). Associated modifications differed from those at FAIRE-enriched SINEs as well as TSS and CTCF sites ( Figure 4 A). H3K56ac and H2AK5ac were most associated with simple repeats. Signals for these modifications were centered over the repeat and demonstrated a magnitude similar or greater than that found at TSS and CTCF sites ( Figures 4 B and S4 A). H4K8ac and H2A.Z were most associated with SINEs and show subtle but center-weighted enrichment ( Figures 4 C and S4 B). Overall, these data indicate that FAIRE-enriched simple repeats and SINEs are characterized by distinctly marked nucleosomes.

(E) Mean H1-ESC and HKC (differentiated cell) MNase-seq signal from low- (left) and high- (right) salt fractions at simple repeats. Signal was normalized to reads per million mapped.

(D) Salt-fractionated nuclear extracts from H1-ESC and HKC (differentiated cell) were immunoblotted with anti-H2AK5ac (green) and anti-pan-H3 (red). Fluorescence intensity was quantified and normalized to H3. Error bars represent standard error of three replicates.

(C) Mean H1-ESC ChIP signal of H4K8ac and H2A.Z at H1-ESC FAIRE-enriched (red line) and FAIRE-negative (black line) SINEs and control regions (TSS green line, CTCF blue line). Figure S5 contains all available histone modifications.

(B) Mean H1-ESC ChIP signal of selected histone posttranslational modifications at H1-ESC FAIRE-enriched (red line) and FAIRE-negative (black line) simple repeats, and control regions (TSS green line, CTCF blue line). Figure S5 contains all available histone modifications.

(A) Heatmap of ranked histone posttranslational modifications. Differential ChIP signal comparing FAIRE-enriched (+) with FAIRE-negative repeats (simple repeat ±250 bp from center or SINEs start to +300 bp) was rank ordered. For comparison, signal at TSS or CTCF (±500 bp) was rank ordered. Histone modification by acetylation is highlighted (green).

To further characterize the relationship between FAIRE and nucleosome positioning at repetitive regions, we examined MNase signal at all simple repeats grouped by the magnitude of FAIRE enrichment. MNase signal was greatest at regions with highest FAIRE enrichment. Further, regions with the greatest FAIRE signal demonstrated the presence of a single centered nucleosome ( Figure 3 C). For all regions, we observed symmetrical nucleosome phasing extending beyond the repetitive region. Overall, these data indicate that, in contrast to the recognized association of FAIRE with nucleosome depletion, in the context of these regions, FAIRE identifies a chromatin organizational feature characterized by the presence of nucleosomes.

Because of the discrepancy between FAIRE and DNase at repetitive regions, we then explored nucleosome positioning using published MNase-seq data (). By cleaving DNA in the linker region between two nucleosomes, MNase-seq offers insight into the location of nucleosomes. DNase-positive regions, including those in repeats, TSS, and CTCF binding sites, demonstrated decreased MNase signal, consistent with nucleosome depletion ( Figure 3 A). However, FAIRE-enriched SINEs and shorter simple repeats exhibited the presence of one to two well-positioned nucleosomes. Of note, phased nucleosomes flanked both classes of repeats, similar to patterns observed at other regulatory elements ( Figure 3 B;).

The insulator binding protein CTCF positions 20 nucleosomes around its binding sites across the human genome.

Given the abundance of FAIRE-enriched repeats, we were surprised that chromatin accessibility at these sites had not been previously observed. As a complementary approach, we analyzed DNase hypersensitivity data (DNase). In contrast to FAIRE, DNase depends on enzymatic digestion to interrogate chromatin accessibility (). Leveraging publicly available data, we compared FAIRE and DNase signal at simple repeats and SINEs at FAIRE peaks (from Figure 1 C). Surprisingly, these regions lacked DNase signal ( Figure 3 A). Conversely, the few repeats that demonstrated DNase signal lacked FAIRE enrichment (1,099 and 2,033 simple repeats and SINEs, respectively). As a control, we examined FAIRE and DNase at transcription start sites (TSSs) and CTCF sites. FAIRE and DNase positively correlated at these regions, consistent with published results and confirming the validity of the assays ( Figures 3 B and S3 A). To determine whether the variation observed between FAIRE and DNase was due to differences in alignment of the shorter DNase reads, we truncated 50-bp FAIRE-seq reads to the 20-bp sequence used for DNase-seq and realigned them to genome. We again noted enrichment of FAIRE signal at repetitive elements indicating that read length was not a factor ( Figure S3 B).

(C) H1-ESC MNase-seq signal at simple repeats grouped by quartiles of FAIRE signal. An equal number of random genomic windows are plotted for comparison (control, green).

(A) Heatmap representations of H1-ESC FAIRE-seq, DNase, and MNase-seq signal () at FAIRE-enriched (FAIRE +) or DNase-enriched (DNase +) simple repeats and SINEs rank ordered by length. For reference, repeat positions (defined by RepeatMasker) are also plotted.

Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position.

To explore other factors that may influence chromatin status, we asked whether length and G/C content distinguish those repeats that are FAIRE enriched. For simple repeats, the lengths of FAIRE-enriched and FAIRE-negative sites varied little. However, enriched sites demonstrated a significant skew toward higher G/C content ( Figure 2 C). The opposite pattern was observed for SINEs. FAIRE-enriched SINEs were significantly longer than others, whereas G/C content differed only slightly ( Figure 2 F). Overall, FAIRE identifies repetitive elements that are common to multiple hESCs and demonstrate shared chromatin patterns and distinct class-specific DNA features.

We then asked whether simple repeats and SINEs were consistently identified across hESC lines. We observed a significant overlap of simple repeats and SINEs with signal in the top quartile between hESC line (simple repeat: 48%, 86,949 of 179,379; SINEs: 59%, 423,819 of 723,415; p < 0.001 by permutation) ( Figures 2 A, 2D, S2 A, and S2B). Consistent with a central role of the repetitive segment in mediating chromatin state, we found that for both simple repeats and SINEs FAIRE signal was centered at the repetitive element, rather than extending from flanking regions, and was concordant between the stem cells ( Figures 2 B, 2E, S2 C, and S2D).

(F) Lengths (left) and G/C content (right) of SINEs that are FAIRE-enriched (+) in all hESCs (see D, n = 180,105) and an equal number of randomly selected simple repeats from quartile 1 (FAIRE –).

(E) Heatmap demonstrating FAIRE signal at SINEs grouped by categories (colors defined in D). Distance represents base pairs from the start of the SINEs.

(D) The union set of SINEs with FAIRE signal in the top quartile (Q4) for hESCs are shown (p < 0.001, permutation based on all SINEs, also see Figure S3 ).

(C) Lengths (left) and G/C content (right) of simple repeats that are FAIRE enriched (+) in all hESCs (see A, n = 27,948) and an equal number of randomly selected simple repeats from quartile 1 (FAIRE –).

(B) Heatmap demonstrating FAIRE signal at simple repeats grouped by categories (colors defined in A) with. Distance represents base pairs from the center of the simple repeat.

(A) The union set of simple repeats with FAIRE signal in the top quartile (Q4) for hESCs are shown (p < 0.001, permutation based on all simple repeats, also see Figure S3 ).

Simple repeats and SINEs consist of thousands to millions of individual regions. Only a small fraction was identified by FAIRE (<5% of each class, Table S3 ). Since it might be expected that repetitive regions of the genome pose challenges to accurate sequence alignment, we explored the ability of sequencing reads to align to repetitive elements. Using on a 50-bp Kmer, we found that 78% of all nucleotide positions within repetitive elements are mappable using default alignment criteria. Setting a more stringent threshold by permitting only unique reads to map, 72% of all base pairs were deemed unique. Further, using the ENCODE mappability track, we found that the majority (51%) of FAIRE positive SINEs contain enough sequence diversity to map 50-bp reads in over 50% of the repeat length and contain bps near the 5′ and 3′ ends that have an average mappability score >0.5 ( Figures S1 D and S1E). Taken together, these data demonstrate that despite classification as repetitive, these regions contain sufficient sequence diversity to enable accurate mapping of sequence tags. However, it remains possible that variation in mappability across individual repetitive elements may lead to over or underestimation of FAIRE signal at specific regions.

We then assayed FAIRE signal differences in each repeat class. After normalizing for repeat length and sequencing depth, signal in hESCs at simple repeats and SINEs greatly exceeded that of differentiated control cells ( Figure 1 E). In contrast, signal differences at LINEs were minimal, and DNA transposons demonstrated an inverse relationship. Taken together, read, signal, and peak-based detection approaches consistently identify the selective enrichment of simple repeats and SINEs by FAIRE in hESCs.

In further support of repetitive element enrichment, we also found that a large fraction of sequencing reads from each hESC line was discarded during alignment due to redundant genomic mapping ( Table S1 ). 83% of unaligned sequences from H1-ESC were repetitive in nature, enriched for SINEs, simple repeats, and long interspersed elements (LINEs) ( Table S2 ). In contrast, similar analysis of HUVEC FAIRE identified only 51% of discarded reads as repetitive sequences, a fraction consistent with the abundance of these elements genome-wide.

We then assessed whether the enrichment of repetitive elements was restricted to specific classes. Simple repeats and short interspersed nucleotide elements (SINEs) were selectively enriched among FAIRE peaks in hESCs, relative to their genomic prevalence. This pattern was not observed in the three differentiated cell types ( Figure 1 D).

To further characterize FAIRE selected chromatin, we identified 610,887 regions with significant signal enrichment (peaks) in H1-ESC, 243,467 in H7-ESC, and 384,162 in H9-ESC (MACS2) (). Applying a false discovery rate threshold, we selected the top ∼150,000 peaks for further analysis. The filtered regions were then intersected with repetitive elements defined by RepeatMasker (), requiring that the site of greatest FAIRE signal was within one bp of a repetitive element. Strikingly, we found that over 82.9%, 94.6%, and 94.0% of peak summits identified in H1-ESC, H9-ESC, and H7-ESC, respectively, intersected a repetitive element. The degree of overlap for each hESC was significantly greater than HUVEC, NHEK, K562, and a randomly permuted peak set ( Figure 1 C). Varying the stringency used to select peaks had no effect on fractional overlap ( Figure S1 C).

We then identified genomic regions that were unique to stem cells. We compared Z-score-transformed FAIRE signal in 500-bp windows to publicly available data from three differentiated cell types, each representing distinct developmental lineages (human umbilical vein endothelial cell [HUVEC], K562, and NHEK) (). Of the regions that passed a minimum signal filter, 12,026 sites demonstrated a significant difference between hESCs and the three differentiated cell types (p ≤ 0.01, t test). Hierarchical clustering resolved these regions into two major groups ( Figure 1 A). Cluster 1 (C1) consisted of regions with increased FAIRE signal in hESCs. Cluster 2 (C2) contained regions with higher signal in the differentiated cell lines ( Figures 1 A and 1B). The two clusters demonstrated significant differences in location. C1 regions were primarily distal, with a median distance of 39.5 kb to the nearest TSS, whereas C2 regions were primarily proximal, with a median distance of 11.4 kb ( Figure 1 A). We then annotated the genomic intervals with classifications previously generated by segmentation analyses in H1-ESC, HUVEC, K562, and NHEK (ChromHMM) (). C1 was significantly enriched for transcription and heterochromatic/repetitive states (p < 0.001, Figures 1 A and S1 B). In contrast, C2 was enriched for states such as active and poised promoters, as well as insulators (p < 0.001, Figures 1 A and S1 B). Interestingly, despite the striking difference in FAIRE signal between cell types, regions in these clusters were similarly classified. Taken together, these data revealed widespread accessible chromatin in stem cells at genomic regions classified as heterochromatic.

(E) Normalized FAIRE signal at Simple Repeats, SINEs, DNA transposon, and LINE in hESCs (green) and control cell lines (blue) plotted by quartile.

(D) Heatmap depicting the enrichment of specific classes of repetitive elements in MACS2-identified FAIRE-enriched regions, relative to genomic coverage.

(C) Fraction of top 150,000 peak summits that overlapped a repetitive element in hESCs (green) and controls (blue). Fractional overlap with an H1-ESC permuted peak set (red line) and SD (dashed lines) are shown.

(A) Heatmap of those regions with significantly different FAIRE enrichment between hESCs and control HUVEC, K562, and NHEK (500-bp windows, p ≤ 0.01, t test, row– row>3). Regions were assigned classes based on distance to nearest TSS (<10 kb red, 10–20 kb white, >20 kb blue and by segmentation analysis;). See also Figure S1

Discovery and characterization of chromatin states for systematic annotation of the human genome.

To explore chromatin organization in human embryonic stem cells, we performed FAIRE-seq on undifferentiated H1-ESC (WA01), H7-ESC (WA07), and H9-ESC (WA09) cells and aligned sequencing reads to the human genome, as previously described (). As expected, FAIRE signal was enriched at transcriptional start sites (TSSs) and CTCF binding sites in all hESCs ( Figure S1 A) (). We also observed signal enrichment at OCT4 and NANOG binding sites, factors critical for the maintenance of pluripotency ( Figure S1 A) ().

Discussion

By integrating complementary genome-wide approaches, we identified a unique chromatin environment in stem cells marked by accessible chromatin at repetitive DNA sequences. Further, we associate these features with the selective targeting of the central oncogene in Ewing sarcoma suggesting that stem cells harbor a permissive environment that facilitates the critical oncogenic step in this cancer.

Lee et al., 2004 Lee J.H.

Hart S.R.

Skalnik D.G. Histone deacetylase activity is required for embryonic stem cell differentiation. Shogren-Knaak et al., 2006 Shogren-Knaak M.

Ishii H.

Sun J.M.

Pazin M.J.

Davie J.R.

Peterson C.L. Histone H4-K16 acetylation controls chromatin structure and protein interactions. Cuddapah et al., 2009 Cuddapah S.

Jothi R.

Schones D.E.

Roh T.Y.

Cui K.

Zhao K. Global analysis of the insulator binding protein CTCF in chromatin barrier regions reveals demarcation of active and repressive domains. The most characteristic feature associated with accessible repetitive elements was histone acetylation. Variations in histone acetylation have been linked to stem cell differentiation, and nucleosome acetylation can destabilize DNA-nucleosome interactions (). Interestingly, the sites of acetylation enriched at simple repeats differ from the well-studied H3K27ac and H3K9ac. Segmentation analysis of stem cells has generally categorized repeats as heterochromatic; however, these modeling approaches have not included atypical marks, such as H2AK5ac. Indeed, evidence suggests that H2AK5ac enrichment is associated with active regions of chromatin (). Given the paucity of available datasets, features other than histone acetylation may also influence chromatin accessibility. As a functional readout of chromatin states, the inclusion of FAIRE may increase the power of predictive genomic segmentation.

Guenther et al., 2010 Guenther M.G.

Frampton G.M.

Soldner F.

Hockemeyer D.

Mitalipova M.

Jaenisch R.

Young R.A. Chromatin structure and gene expression programs of human embryonic and induced pluripotent stem cells. Hawkins et al., 2010 Hawkins R.D.

Hon G.C.

Lee L.K.

Ngo Q.

Lister R.

Pelizzola M.

Edsall L.E.

Kuan S.

Luu Y.

Klugman S.

et al. Distinct epigenomic landscapes of pluripotent and lineage-committed human cells. In support of a functional role for repetitive elements, variation in chromatin organization that accompanied differentiation was significantly linked to the regulation of relevant genes. Genes associated with SINEs and simple repeats that demonstrated differential accessibility exhibited pathway enrichment specific to pluripotency such as SOX2, OCT4, and Nanog targets. Similarly, pathways associated with mesenchymal differentiation and function were enriched among regions with gains in accessibility during differentiation. Interestingly, variation in histone posttranslational modifications between induced pluripotent stem cells (iPSC) and ESCs has been inconsistently identified (). Analysis of iPSC by FAIRE would identify whether features of chromatin accessibility at repetitive elements are restored during reprogramming and could contribute to chromatin-based exploration of the reprogramming process.

Song et al., 2011 Song L.

Zhang Z.

Grasfeder L.L.

Boyle A.P.

Giresi P.G.

Lee B.K.

Sheffield N.C.

Gräf S.

Huss M.

Keefe D.

et al. Open chromatin defined by DNaseI and FAIRE identifies regulatory elements that shape cell-type identity. Jin et al., 2009 Jin C.

Zang C.

Wei G.

Cui K.

Peng W.

Zhao K.

Felsenfeld G. H3.3/H2A.Z double variant-containing nucleosomes mark ‘nucleosome-free regions’ of active promoters and other regulatory regions. Finally, our study offered unexpected technical insights into FAIRE. Previous studies have noted discrepancies between FAIRE and DNase, particularly at distal regulatory elements (). However, the biochemical differences characterizing those regions that are enriched by FAIRE but not detected by DNase have not been identified. Compared with DNase and ATAC, FAIRE-seq seems unique in its ability to identify unstable nucleosome-bound regions. Resulting from chromatin organizational differences or histone acetylation, these destabilized nucleosomes may not survive the biochemical extraction process of FAIRE. In a similar fashion, unstable H2A.Z/H3.3 containing nucleosomes have been found at regions deemed “nucleosome depleted” (). In contrast, DNase and ATAC depend on exposed DNA for enzymatic cleavage. Consistent with this difference, DNase and ATAC data from Ewing sarcoma indicate nucleosome eviction. Nucleosome eviction was also reflected by quantitative gains in FAIRE signal. Strategies that explore chromatin organization yield distinct insights. Apparent differences between these methods may indicate specific states that influence chromatin accessibility.

Overall, we identify a link between stem cell-specific chromatin features at repetitive elements and cancer development. Because of their abundance, these elements may broadly influence nucleosome positioning and chromatin remodeling during differentiation. Multiple mechanisms result in variation in repeat element structure and location. How these factors converge to alter chromatin organization will continue to enhance our understanding of development and disease.