The spread of bad neighborhoods Our genomes have complex three-dimensional (3D) arrangements that partition and regulate gene expression. Cancer cells frequently have their genomes grossly rearranged, disturbing this intricate 3D organization. Hnisz et al. show that the disruption of these 3D neighborhoods can bring oncogenes under the control of regulatory elements normally kept separate from them (see the Perspective by Wala and Beroukim). These novel juxtapositions can result in the inappropriate activation of oncogenes. Science, this issue p. 1454; see also p. 1398

Abstract Oncogenes are activated through well-known chromosomal alterations such as gene fusion, translocation, and focal amplification. In light of recent evidence that the control of key genes depends on chromosome structures called insulated neighborhoods, we investigated whether proto-oncogenes occur within these structures and whether oncogene activation can occur via disruption of insulated neighborhood boundaries in cancer cells. We mapped insulated neighborhoods in T cell acute lymphoblastic leukemia (T-ALL) and found that tumor cell genomes contain recurrent microdeletions that eliminate the boundary sites of insulated neighborhoods containing prominent T-ALL proto-oncogenes. Perturbation of such boundaries in nonmalignant cells was sufficient to activate proto-oncogenes. Mutations affecting chromosome neighborhood boundaries were found in many types of cancer. Thus, oncogene activation can occur via genetic alterations that disrupt insulated neighborhoods in malignant cells.

Tumor cell gene expression programs are typically driven by somatic mutations that alter the coding sequence or expression of proto-oncogenes (1) (Fig. 1A), and identifying such mutations in patient genomes is a major goal of cancer genomics (2, 3). Dysregulation of proto-oncogenes frequently involves mutations that bring transcriptional enhancers into proximity of these genes (4). Transcriptional enhancers normally interact with their target genes through the formation of DNA loops (5–7), which typically are constrained within larger CCCTC-binding factor (CTCF) cohesin–mediated loops called insulated neighborhoods (8–10), which in turn can form clusters that contribute to topologically associating domains (TADs) (11, 12) (fig. S1A). This recent understanding of chromosome structure led us to hypothesize that silent proto-oncogenes located within insulated neighborhoods might be activated in cancer cells via loss of an insulated neighborhood boundary, with consequent aberrant activation by enhancers that are normally located outside the neighborhood (Fig. 1A, lowest panel).

Fig. 1 3D regulatory landscape of the T-ALL genome. (A) Mechanisms activating proto-oncogenes. (B) Hi-C interaction map, TADs defined in human embryonic stem cells (H1), cohesin ChIA-PET interactions (intensity of blue arc represents interaction significance), CTCF and H3K27Ac chromatin immunoprecipitation sequencing (ChIP-seq) profiles and peaks, and RNA-seq in Jurkat cells at the CD3D locus. ChIP-seq peaks are denoted as bars above ChIP-seq profiles. (C) ChIA-PET interactions at the RUNX1 locus displayed above the ChIP-seq profiles of CTCF, cohesin (SMC1), and H3K27Ac. FDR, false discovery rate.

To test this hypothesis, we used chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) to map neighborhoods and other cis-regulatory interactions in a cancer cell genome (Fig. 1B and table S1). A T cell acute lymphoblastic leukemia (T-ALL) Jurkat cell line was selected for these studies because key T-ALL oncogenes and genetic alterations are well known (13, 14). The ChIA-PET technique generates a high-resolution (~5 kb) chromatin interaction map of sites in the genome bound by a specific protein factor (8, 15, 16). Cohesin was selected as the target protein because it is involved in both CTCF-CTCF interactions and enhancer-promoter interactions (5–7) and has proven useful for identifying insulated neighborhoods (8, 10) (fig. S1, A and B). The cohesin ChIA-PET data were processed using multiple analytical approaches (figs. S1 to S4 and table S2), and their analysis identified 9757 high-confidence interactions, including 9038 CTCF-CTCF interactions and 379 enhancer-promoter interactions (fig. S4C). The CTCF-CTCF loops had a median length of 270 kb, contained on average two or three genes, and covered ~52% of the genome (table S2). Such CTCF-CTCF loops have been called insulated neighborhoods because disruption of either CTCF boundary causes dysregulation of local genes due to inappropriate enhancer-promoter interactions (8, 10). Consistent with this, the Jurkat chromosome structure data showed that the majority of cohesin-associated enhancer-promoter interactions had end points that occurred within the CTCF-CTCF loops (Fig. 1C and fig. S2H). These results provide an initial map of the three-dimensional (3D) regulatory landscape of a tumor cell genome.

We next investigated the relationship between genes that have been implicated in T-ALL pathogenesis and the insulated neighborhoods. The majority of genes (40 of 55) implicated in T-ALL pathogenesis, as curated from the Cancer Gene Census and individual studies (table S3), were located within the insulated neighborhoods identified in Jurkat cells (Fig. 2A and fig. S5); 27 of these genes were transcriptionally active and 13 were silent, as determined by RNA sequencing (RNA-seq) (Fig. 2A and table S4). Active oncogenes are often associated with superenhancers (17, 18), and we found that 13 of the 27 active T-ALL pathogenesis genes were associated with superenhancers (Fig. 2, A and B, and fig. S5A). Silent genes have also been shown to be protected by insulated neighborhoods from active enhancers located outside the neighborhood, and we found multiple instances of silent proto-oncogenes located within CTCF-CTCF loop structures in the Jurkat genome (Fig. 2, A and C, and fig. S5B). Thus, both active oncogenes and silent proto-oncogenes are located within insulated neighborhoods in these T-ALL cells.

Fig. 2 Active oncogenes and silent proto-oncogenes occur in insulated neighborhoods. (A) T-ALL pathogenesis genes. Colored boxes indicate whether a gene is located within a neighborhood, expressed, and associated with a superenhancer. (B) Insulated neighborhood at the active TAL1 locus. The cohesin ChIA-PET interactions are displayed above the ChIP-seq profiles of CTCF, cohesin (SMC1) H3K27Ac, and RNA-seq profile. A model of the insulated neighborhood is shown on the right. (C) Insulated neighborhood at the silent LMO2 locus.

If some insulated neighborhoods function to prevent proto-oncogene activation, some T-ALL tumor cells may have genetic alterations that perturb the CTCF boundaries of neighborhoods containing T-ALL oncogenes. To investigate this possibility, we identified recurrent deletions in T-ALL genomes that span insulated neighborhood boundaries, using data from multiple studies (table S5A) and filtered for relatively short deletions (<500 kb) so as to minimize collection of deletions that affect multiple genes (fig. S6A). Among the 438 recurrent deletions identified with this approach, 113 overlapped at least one boundary of insulated neighborhoods identified in T-ALL, and 6 of these affected neighborhoods containing T-ALL pathogenesis genes (fig. S6B and table S5B). Examples of two such genes, TAL1 and LMO2, are shown in Fig. 3, A and G.

Fig. 3 Disruption of insulated neighborhood boundaries is linked to proto-oncogene activation. (A) Cohesin ChIA-PET interactions and CTCF and cohesin (SMC1) binding profiles at the TAL1 locus in Jurkat cells. Patient deletions described in (22) are shown as bars below the gene models. The deletion on the bottom indicates the minimally deleted region identified in (26). (B) ChIP-seq profiles of CTCF, H3K27Ac, p300, CBP, and RNA-seq at the TAL1 locus in HEK-293T cells. The region deleted using a CRISPR/Cas9-based approach is highlighted in a gray box. (C) Quantitative reverse transcription polymerase chain reaction (qRT-PCR) analysis of TAL1 expression in wild-type HEK-293T cells (wt) and in cells where the neighborhood boundary highlighted in (B) was deleted. (D) Model of the neighborhood and perturbation at the TAL1 locus. (E) 5C contact matrices in wild-type HEK-293T cells and TAL1 neighborhood boundary–deleted cells. An arrow indicates the position of the region removed in the mutant cells. (F) Distance-adjusted z-score difference (5C) maps at the TAL1 locus (ΔCTCF – wild-type HEK-293T). Note the increase in the 5C signal adjacent to the deleted region. CTCF and H3K27Ac binding profiles in wild-type cells are displayed for orientation. (G) Cohesin ChIA-PET interactions and CTCF and cohesin (SMC1) binding profiles at the LMO2 locus. Patient deletions described in (22) are shown as bars below the gene models. (H) ChIP-Seq binding profile of CTCF, H3K27Ac, p300, CBP, and RNA-seq at the LMO2 locus in HEK-293T cells. The region deleted by a CRISPR/Cas9-based approach is highlighted in a gray box. (I) qRT-PCR analysis of LMO2 expression in wild-type HEK-293T cells and in cells where the neighborhood boundary highlighted in (H) was deleted. (J) Model of the neighborhood and perturbation at the LMO2 locus. (K) 5C contact matrices in wild-type HEK-293T cells and LMO2 neighborhood boundary–deleted cells. An arrow indicates the position of the region removed in the mutant cells. (L) Distance-adjusted z-score difference (5C) maps at the LMO2 locus (ΔCTCF – wild-type HEK-293T). Note the increase in the 5C signal adjacent to the deleted region. CTCF and H3K27Ac binding profiles in wild-type cells are displayed for orientation. In (C) and (I), data from n = 3 independent biological replicates are displayed as means ± SD; P < 0.01 between wild-type and boundary-deleted cells (two-tailed t test).

If deletions overlapping neighborhood boundaries can cause activation of proto-oncogenes within the loops, then site-specific deletion of a loop boundary CTCF site at the TAL1 locus should be sufficient to activate these proto-oncogenes in nonmalignant cells. TAL1 encodes a transcription factor that is overexpressed in ~50% of T-ALL cases and is a key oncogenic driver of this cancer (19, 20). TAL1 can be activated by deletions that fuse a promoterless TAL1 gene to the promoter of STIL (19), and this was observed in many patient deletions (Fig. 3A). Several patient deletions, however, retained the TAL1 promoter (end point >5 kb from promoter) but overlapped the CTCF boundary site of the TAL1 neighborhood (Fig. 3A), and TAL1 was active in the samples harboring these deletions (fig. S7, A and B). This suggests disruption of the insulated neighborhood, allowing activation of TAL1 by regulatory elements outside of the loop.

We tested this idea by CRISPR/Cas9-mediated deletion of the TAL1 neighborhood boundary in human embryonic kidney (HEK-293T) cells (Fig. 3B). In these cells the TAL1 proto-oncogene is silent, as evidenced by low H3K27Ac (histone H3 acetylated Lys27) occupancy and RNA-seq (Fig. 3B). However, at least one active regulatory element occurs ~60 kb upstream of TAL1, adjacent to the CMPK1 promoter, as evidenced by high levels of H3K27Ac and p300/CBP (Fig. 3B) and enhancer reporter assays (fig. S8, A and B). Deletion of a ~400–base pair (bp) segment encompassing the boundary CTCF site, which abolished CTCF binding (fig. S8A), caused a factor of 2.3 induction of the TAL1 transcript (Fig. 3C), which suggests that the integrity of the neighborhood contributes to the silent state of TAL1 (Fig. 3D). In support of this model, contacts between DNA regions that are normally within and outside of the neighborhood were increased (Fig. 3, E and F, and fig. S10). Furthermore, deletion of the CTCF site in primary human T cells also caused a small but detectable activation of TAL1 (fig. S8, C to G). These results are consistent with the idea that the silent state of the TAL1 proto-oncogene is dependent on the integrity of the insulated neighborhood (Fig. 3D).

We further tested the model that site-specific perturbation of a loop boundary is sufficient to activate a proto-oncogene at the LMO2 locus. The LMO2 gene encodes a transcription factor that is overexpressed and oncogenic in some forms of T-ALL (14, 20). The region upstream of the LMO2 promoter is recurrently deleted in T-ALL, and these deletions are linked to LMO2 activation (Fig. 3G); a previous study proposed that deletion of cryptic repressors located in the deleted region enables activation of LMO2 (21). Analysis of a T-ALL patient cohort (22) revealed deletions that overlap the CTCF boundary site of the LMO2 neighborhood, and patient cells harboring these deletions had generally high levels of LMO2 expression (fig. S9, A and B). CRISPR/Cas9-mediated deletion in HEK-293T cells of a ~25-kb segment encompassing the insulated neighborhood boundary CTCF site and two additional CTCF sites that could act as boundary elements caused a factor of 2 increase in the LMO2 transcript (Fig. 3, H to J) and a large-scale rearrangement of interactions around LMO2, as evidenced by chromosome conformation capture carbon copy (5C) analysis (Fig. 3, K and L, and fig. S10). These results indicate that the deleted CTCF sites contribute to the silent state of the LMO2 proto-oncogene (Fig. 3J).

The boundaries of chromosome neighborhoods may be disrupted in other cancers. A recent study noted that mutations in CTCF binding sites occur frequently in cancers (23), but it is unclear whether mutations in boundaries are common, as only a subset of CTCF sites form insulated neighborhoods (8, 10, 24). CTCF cohesin–bound loops are largely preserved across cell types (8, 9, 24), and a set of ~10,000 constitutive CTCF-CTCF loops shared by GM12878 lymphoblastoid, Jurkat, and K562 (CML) cells (24) were identified for comparison (Fig. 4A, fig. S11, and table S8). We used the International Cancer Genome Consortium (ICGC) database—which contains data for ~50 cancer types, ~2300 whole-genome sequence (WGS) samples, and ~13 million unique somatic mutations—to examine the boundaries of these neighborhoods for somatic point mutations found in cancer genomes (table S9). We found a striking enrichment of mutations at the CTCF boundaries of constitutive neighborhoods (Fig. 4B, fig. S12A, and table S10) relative to regions flanking the boundary CTCF sites (±1 kb of the CTCF binding motif; P < 10−4, permutation test) (fig. S12B), and in many instances these created a change in the consensus CTCF binding motif (fig. S12C). Nonboundary CTCF sites did not show such enrichment (Fig. 4B and figs. S12D and S14). The genomes of esophageal and liver carcinoma samples were particularly enriched for boundary CTCF site mutations (Fig. 4, C and D, fig. S12, D and E, fig. S13, and table S10), and there was no similar enrichment of mutations at the binding sites of other transcription factors (fig. S15). In these cancers, a considerable fraction of the mutated neighborhood boundary CTCF sites were affected by multiple mutations (≥3 mutations per site) [280/1826 (15%) in esophageal carcinoma, 54/1030 (5%) in liver carcinoma] (table S10), and recurrent mutations occurred more frequently in neighborhood boundary CTCF sites relative to nonboundary CTCF sites (fig. S16, A to C). The genes located within the most frequently mutated neighborhoods included known cellular proto-oncogenes annotated in the Cancer Gene Census and other genes that have not been associated with these cancers (Fig. 4, E and F, and tables S11 and S12). Shown in Fig. 4, G and H, are two examples of proto-oncogene–containing neighborhoods where the activation of the gene located in the neighborhood has been observed in the respective cancer type. These results suggest that somatic mutations of insulated neighborhood boundaries occur in the genomes of many different cancers.

Fig. 4 Somatic mutations of neighborhood boundaries occur in many cancers. (A) “Constitutive neighborhood” at the NOTCH1 locus. CTCF ChIP-seq and cohesin ChIA-PET interactions in Jurkat (T-ALL), GM12878 (lymphoblastoid), and K562 (CML) cells are displayed. (B) Frequency of somatic mutations in the ICGC database at CTCF sites that form constitutive neighborhood boundaries (left) and CTCF sites that do not form neighborhood boundaries (right). (C) Somatic mutations in esophageal adenocarcinoma (ESAD-UK) at constitutive neighborhood boundary CTCF sites. (D) Somatic mutations in hepatocellular carcinoma (LIRI-JP) at constitutive neighborhood boundary CTCF sites. (E and F) Genes in constitutive neighborhoods whose boundary is recurrently mutated in esophageal adenocarcinoma (E) and in hepatocellular carcinoma (F). The bars depict the number of mutations in the neighborhood boundary site. Proto-oncogenes annotated in the Cancer Gene Census are highlighted in red. (G and H) Mutations in the boundary sites of the neighborhood containing the LMO1 proto-oncogene in esophageal adenocarcinoma (G) and the FGFR1 proto-oncogene in hepatocellular carcinoma (H). The enrichment of mutations at the constitutive neighborhood boundary sites (±5 bp of the motif) shown in (B) to (D) relative to regions flanking the binding sites has a P value of <10−4 (permutation test).

Our findings indicate that disruption of insulated neighborhood boundaries can cause oncogene activation in cancer cells. With maps of 3D chromosome structure such as those described here, cancer genome analysis can consider how recurrent perturbations of boundary elements may affect the expression of genes with roles in tumor biology. Our understanding of 3D chromosome structure and its control is rapidly advancing and should be considered for potential diagnostic and therapeutic purposes. Because control of 3D chromosome structure involves binding of specific sites by CTCF and cohesin, which is affected by protein cofactors, DNA methylation, and local RNA synthesis (25), advances in our understanding of these regulatory processes may provide new approaches to therapeutics that have an impact on aberrant chromosome structures.

Supplementary Materials www.www.sciencemag.org/content/351/6280/1454/suppl/DC1 Materials and Methods Figs. S1 to S16 Tables S1 to S13 References (27–71)

Acknowledgments: Supported by NIH grants HG002668 (R.A.Y.), CA109901 (R.A.Y.), NS088538 (R.J.), MH104610 (R.J.), and AI120766 (M.H.P.); an Erwin Schrödinger Fellowship (J3490) from the Austrian Science Fund (FWF) (D.H.); Ludwig Graduate Fellowship funds (A.S.W.); the Laurie Kraus Lacob Faculty Scholar Award in Pediatric Translational Research (M.H.P.); Hyundai Hope on Wheels (M.H.P.); and Danish Council for Independent Research, Medical Sciences, individual postdoctoral grant DFF–1333-00106B and Sapere Aude Research Talent grant DFF–1331-00735B (R.O.B.). Work in the Dekker lab is supported by the National Human Genome Research Institute (R01 HG003143, U54 HG007010, U01 HG007910), the National Cancer Institute (U54 CA193419), the NIH Common Fund (U54 DK107980, U01 DA 040588), the National Institute of General Medical Sciences (R01 GM 112720), and the National Institute of Allergy and Infectious Diseases (U01 R01 AI 117839). J.D. is an investigator of the Howard Hughes Medical Institute. We thank R. Fitzgerald, S. Grimmond, and the ICGC Genome Projects ESAD-UK and OV-AU for permission to use genome sequence data. Data sets generated in this study have been deposited in the Gene Expression Omnibus under accession number GSE68978. The Whitehead Institute filed a patent application based on this paper. R.A.Y. is a founder of Syros Pharmaceuticals, and R.J. is a founder of Fate Therapeutics.