Abstract The noncoding genome affects gene regulation and disease, yet we lack tools for rapid identification and manipulation of noncoding elements. We developed a CRISPR screen using ~18,000 single guide RNAs targeting >700 kilobases surrounding the genes NF1, NF2, and CUL3, which are involved in BRAF inhibitor resistance in melanoma. We find that noncoding locations that modulate drug resistance also harbor predictive hallmarks of noncoding function. With a subset of regions at the CUL3 locus, we demonstrate that engineered mutations alter transcription factor occupancy and long-range and local epigenetic environments, implicating these sites in gene regulation and chemotherapeutic resistance. Through our expansion of the potential of pooled CRISPR screens, we provide tools for genomic discovery and for elucidating biologically relevant mechanisms of gene regulation.

More than 98% of the human genome does not code for proteins; however, unlike for the coding genome, there exists no overarching framework to translate the noncoding genomic sequence into functional elements (1, 2). Evidence from genome-wide association studies suggests that many noncoding regions are critical for human health (3, 4). The implications of these associations, however, have been difficult to assess, in part because we lack the tools to determine which variants alter functional elements. Molecular hallmarks, such as epigenetic state, chromatin accessibility, transcription factor binding, and evolutionary conservation, correlate with putative functional elements in the noncoding genome and can predict regulatory function (2, 5). However, these predictions largely bypass regions lacking hallmarks, and it is difficult to ascertain which hallmarks play a correlative or truly causal role in function or phenotype (6, 7). Efforts to determine causality have used preselected DNA fragments, with expression serving as a proxy for function (8), but these methods lack the local chromatin context and broader regulatory interactions. Thus, there is a need for systematic approaches to sift through noncoding variants and determine whether and how they affect phenotypes in a native biological context.

For this purpose, we designed a high-throughput method that uses pooled CRISPR (clustered regularly interspaced short palindromic repeat)–Cas9 single guide RNA (sgRNA) libraries to screen noncoding genomic loci in order to identify functional regions related to phenotype and gene regulation. Previous applications of CRISPR screens to the noncoding genome have focused on specific functional elements (e.g., microRNAs and transcription factor binding sites) or required fluorescent reporters (9–12). In this work, we comprehensively assayed a total of 715 kilobases (kb) of sequence surrounding three different genes by performing unbiased mutagenesis to identify functional elements relevant to cancer drug resistance.

Vemurafenib inhibits BRAF proteins carrying the V600E mutation, which are found in 50 to 70% of melanomas (13). Resistance to vemurafenib arises within months in almost all melanoma patients (14), and surviving tumor cells display increased malignancy that rapidly leads to lethality (15). A genome-scale CRISPR screen found that loss-of-function mutations in NF1, NF2, and CUL3 result in vemurafenib resistance (16). To explore whether mutations in the noncoding regions around these three genes could similarly affect drug resistance, we designed three sgRNA libraries tiling across 100-kb regions 5′ and 3′ of each gene’s major isoforms (Fig. 1A). For each library, we synthesized the sgRNAs as a pool (6682 for NF1, 6934 for NF2, and 4699 for CUL3; 18,315 sgRNAs in total) and cloned them into a lentiviral vector (fig. S1). We transduced A375 human melanoma cells, which carry the BRAF mutation, with the sgRNA libraries at a low multiplicity of infection and cultured them in 2 uM vemurafenib or control (dimethyl sulfoxide, DMSO) for 14 days. Using deep sequencing, we counted the representation of sgRNAs in both conditions (Fig. 1, B to D) and identified vemurafenib-enriched sgRNAs as those enriched by >4 standard deviations from the control distribution (fig. S2).

Fig. 1 CRISPR mutagenesis of noncoding regions flanking three genes involved in BRAF inhibitor resistance. (A) Design of sgRNA libraries targeting 100 kb 5′ and 100 kb 3′ of a gene. The sgRNAs are array-synthesized and cloned into a lentiviral vector. BRAF mutant cells are transduced with the pooled lentivirus and treated with vemurafenib (vemu) or DMSO (control). A deep sequencing readout identifies sgRNAs enriched after treatment with vemurafenib. (B) Scatterplot of normalized read counts (average of the two infection replicates) for NF1, (C) NF2, and (D) CUL3 sgRNAs at days 0 (x axes) and 14 (y axes). Read counts from control (gray) and vemurafenib-treated cells (red) are shown relative to 4 standard deviations of the control cell distribution (dotted line). The percentage of sgRNAs enriched after vemurafenib treatment (>4 standard deviations) is indicated. (E) Distribution of the log 2 ratio of the normalized read count for each sgRNA in vemurafenib to its normalized read count in control (minimum of the two infection replicates). (F) All CUL3 sgRNAs, plotted by human reference genome hg19 coordinates, and the percent expression of the two most highly expressed CUL3 isoforms [primary and alternate (alt)]. For vemurafenib-enriched sgRNAs, the log 2 enrichment relative to control sgRNAs (minimum value of two replicate screens) is plotted (red); nonenriched sgRNAs are indicated in blue. (G) Percent of enriched sgRNAs in each genomic category. CDS, coding sequence; UTR, untranslated region; prom., promoter.

Overall, most sgRNAs were depleted after treatment with vemurafenib, which is expected because vemurafenib targets the oncogene addiction that drives A375 growth (Fig. 1E). However, in all three libraries, we found a small group of sgRNAs that were enriched after vemurafenib treatment (log 2 ratio of vemurafenib to control > 0), with the CUL3 library having the largest percentage of enriched sgRNAs. We also included a small number of sgRNAs targeting the coding region of each gene, and most of these (70 to 80%) were enriched, as expected (fig. S3A). However, among the sgRNAs targeting noncoding regions, about fourfold more were enriched in the CUL3 library than in the NF1 or NF2 libraries (7.2% in CUL3, 1.7% in NF1, and 2.1% in NF2), suggesting the presence of more gene regulatory elements in the noncoding regions flanking the gene (fig. S3A). To determine whether this increase in putative gene regulatory elements in the 200-kb region surrounding CUL3 is also reflected in human gene expression and genotyping data, we queried the Genotype-Tissue Expression (GTEx) database (7051 tissue samples from 449 donors). We found that CUL3 had the largest number of cis-expression quantitative trait loci (eQTL) (n = 161 eQTLs, mean effect size = –0.21), and the region targeted by the sgRNA library overlapped with a large number of these eQTLs (fig. S3B) (17). We thus chose to focus our downstream analysis and validation efforts on CUL3.

We visualized the enriched sgRNAs in a genome browser–style view (Fig. 1F and fig. S4, A and B). We found that a higher percentage of sgRNAs targeting gene-proximal elements were enriched compared with other noncoding regions (Fig. 1G), and we found greater enrichment for sgRNAs targeting noncoding elements on the 5′ side of the gene than for those on the 3′ side (fig. S4C).

To test whether regions targeted by enriched sgRNAs from the screen physically interact with the CUL3 promoter through chromatin looping (18), we created three independent chromosome conformation capture (3C) libraries (Fig. 2A) (19). We quantified the interaction frequency for each site across the ~200-kb region (supplementary methods) and found that regions on the 5′ side of CUL3 tend to interact more strongly with the promoter. Regions with higher 3C interaction contained, on average, more vemurafenib-enriched sgRNAs (Fig. 2B).

Fig. 2 Characteristics of functional noncoding elements at the CUL3 locus. (A) Plot of the normalized 3C interactions with the CUL3 promoter in A375 cells. Data points represent independent libraries generated with BglII, EcoRI, and HindIII restriction enzymes. The gray curve shows a smoothed estimate of interaction frequency. Table S3 provides the genomic coordinates of baits and probes. (B) Average window enrichment of sgRNAs (log 2 ratio of vemurafenib to DMSO reads) near 3C sites with the specified minimum normalized 3C interaction with the CUL3 promoter (supplementary methods). The red dashed line indicates average window enrichment for all 3C sites in (A). (C) An example of enriched sgRNAs (red) that overlap with a melanoma-specific region of open chromatin. Read counts are shown from ATAC-seq in A375 melanoma (orange), MCF7 breast cancer (purple), and U87 glioblastoma (blue) cells and from melanoma DNase I hypersensitivity (HS) sequencing (green; ENCODE/Colo-829 cell line). Loci investigated with respect to CUL3 are shown at the top (yellow). Scale bar, 500 bases. (D) Fold enrichment of enriched sgRNAs near ATAC-seq open chromatin peaks in melanoma, breast cancer, and glioblastoma cell lines. (E) Same as (D), but for DNAse I hypersensitivity sequencing. (F) An example of enriched sgRNAs (red) that coincide with regions that show greater primate-specific conservation than placental mammal and vertebrate conservation. Loci investigated with respect to CUL3 are shown at the top (yellow). Scale bar, 200 bases. (G) Fold enrichment of enriched sgRNAs near phastCons peaks in primates, placental mammals, and vertebrates. **P < 0.01; *P < 0.05.

Because chromatin accessibility can identify regulatory regions (20, 21), we performed ATAC-seq (assay for transposase-accessible chromatin with sequencing) in A375 melanoma, MCF7 breast adenocarcinoma, and U87 glioblastoma cells (Fig. 2C). Overall, we found higher sgRNA enrichment near A375-specific ATAC peaks than near those from other cell types, and this finding was replicated with deoxyribonuclease (DNase) I hypersensitivity data (Fig. 2, D and E, and fig. S5). Enrichment in these regions suggests the presence of cell type–specific enhancers (22, 23). Even though the accessible peaks overlapped with enriched sgRNA sites, the chromatin accessibility data by themselves only predict a small fraction of the total number of enriched sgRNA sites (table S1).

Given that evolutionary conservation varies widely across the noncoding genome, we sought to test whether regions exhibiting higher levels of conservation harbor more enriched sgRNAs. We examined phastCons conservation scores over the CUL3 locus among primates, placental mammals, and vertebrates (Fig. 2F) (24). Overall, enriched sgRNAs were ~1.8-fold more likely to be found near peaks of primate conservation and ~1.7-fold less likely to be found near conservation peaks among mammals and vertebrates (Fig. 2G and fig. S5). In contrast, the genomic sites of sgRNAs targeting coding regions of CUL3 did not demonstrate differential conservation (phastCons probability of ~0.95 in primates, mammals, and vertebrates). This observation supports recent findings that enhancers evolve rapidly in a lineage- or species-specific manner and that conserved enhancers between mammals tend to be rare (25).

To confirm that mutations in these specific noncoding regions were mediated by CUL3 and lead to altered drug resistance, we transduced cells with individual sgRNAs that have at least one other enriched sgRNA within 500 bases (Fig. 3A). We validated that the sgRNAs created mutations at the intended target sites (fig. S6) and found that 24 of the 25 sgRNAs resulted in decreased CUL3 expression relative to nontargeting sgRNAs (Fig. 3B). As expected, there was a negative correlation between CUL3 gene expression and vemurafenib resistance (correlation coefficient r = –0.54, P = 0.005) (Fig. 3C), and increased vemurafenib resistance could be reversed by restoring CUL3 expression (fig. S7).

Fig. 3 Noncoding mutations affect CUL3 expression and histone modifications. (A) Criteria for selecting 25 sgRNAs targeting noncoding regions for validation. (B) CUL3 RNA expression (normalized to nontargeting sgRNAs) after transduction with lentivirus expressing nontargeting (triangles), noncoding region–targeting (colored circles), and coding region–targeting (squares) sgRNAs. (C) Relationship between CUL3 expression and cell survival after vemurafenib treatment. The linear fit is to noncoding sgRNAs only (r noncoding = –0.54, P = 0.005) and does not include coding region–targeting or nontargeting sgRNAs. (D) Percent change in (left) average H3K4me3 ChIP for all validation sgRNAs within 1 kb of the transcription start site of CUL3 and (right) average H3K27ac and average H3K4me2 ChIP for all validation sgRNAs >1 kb from the transcriptions start site of CUL3. ***P < 0.001; *P < 0.05.

Next, we surveyed changes in posttranslational histone modifications at each target site (fig. S8A). For target sites near the promoter, we found a 56% average decrease in H3K4me3 after editing (P = 7 × 10−4, n = 9 sites) (Fig. 3D), which is consistent with the reduced gene expression. At distal sites, we found a 41% average decrease in H3K27ac (P = 0.02, n = 7 sites) after editing and no significant change in H3K4me2 (P = 0.82, n = 7 sites) (Fig. 3D), although a subset of H3K4me2 levels decreased after editing (fig. S8B). We also found that mutagenesis of a ~22-kb distal histone acetyltransferase (p300) binding site that has a strong 3C promoter interaction resulted in a 75% decline of promoter H3K27ac and a 50% decrease in CUL3 expression (fig. S9 and supplementary text).

By examining regions targeted by enriched sgRNAs, we found individual loci containing the canonical transcription factor binding motifs for yin yang 1 (YY1), zinc finger protein 263 (ZNF263), CCCTC-binding factor (CTCF), and activation protein 1 (AP-1) complex, which were disrupted after editing (Fig. 4, A to D, and fig. S10). We found that mutations within these binding sites abrogated transcription factor recruitment, leading to loss of CUL3 expression (Fig. 4, E to H). For example, specific sgRNAs that target loci near a YY1 chromatin immunoprecipitation (ChIP) peak (Fig. 4A) disrupted the YY1 motif (fig. S11), and vemurafenib treatment selected for mutations that were more deleterious to the binding site (fig. S12). Although both of the sgRNAs targeting loci near the site decreased YY1 binding, the sgRNA whose cut site overlaps the motif disrupted YY1 binding more efficiently (67 versus 26%) (Fig. 4E). In addition, mutagenesis by either sgRNA significantly decreased CUL3 expression. Similarly, two sgRNAs in the first intron of CUL3, spaced 30 bases apart, overlap a ZNF263 ChIP sequencing peak (Fig. 4B). Both resulted in a significant decrease in ZNF263 occupancy and in CUL3 expression (Fig. 4F).

Fig. 4 Cas9 mutagenesis disrupts predicted transcription factors and DNA binding proteins at target sites of vemurafenib-enriched sgRNAs. (A to D) sgRNA target locations in relation to predicted binding sites. Table S4 gives sgRNA sequences and target locations. (E to H) Change in transcription factor or DNA binding protein occupancy around cleavage sites and change in CUL3 expression. Both measurements are normalized to cells transduced with nontargeting sgRNAs. ***P < 0.001; **P < 0.01; *P < 0.05.

Although we observed a bias in the presence of regulatory elements 5′ of the transcription start site, we did find several enriched sgRNAs downstream of CUL3 (Fig. 4, C and D, and supplementary text). One sgRNA cuts inside the core motif of CTCF (Fig. 4C). After editing, CTCF occupancy was decreased by 45%, with a concurrent 30% decrease in CUL3 expression (Fig. 4G). For AP-1, a heterodimer of FOS and JUN, editing at either of two nearby sites decreased FOS and JUN binding compared with binding in control cells and decreased CUL3 expression by ~25% (Fig. 4H). Overall, as in the pooled screen, mutagenesis at transcription factor binding sites located on the 3′ side exhibited weaker effects on gene expression than those located on the 5′ side.

Thus, we show how a Cas9-mediated systematic dissection of noncoding loci can identify functional elements involved in gene regulation and cancer drug resistance. In combination with other genome-wide assays, we demonstrate high-throughput identification of regions where changes in chromatin context and transcription factor binding are causally linked to loss of gene expression and a disease-relevant phenotype. This approach is generalizable, and we anticipate that the extension of pooled CRISPR screens into the noncoding genome will provide further insights and methods for unbiased interrogation of the genome.

Supplementary Materials www.sciencemag.org/content/353/6307/1545/suppl/DC1 Materials and Methods Supplementary Text Figs. S1 to S13 Tables S1 to S7 References (26–42)