Unanticipated gene expression outcomes with CRISPR editing

To service several ongoing research programs, we had assembled a panel of commercially available HAP1 cell lines harboring frameshift-inducing INDELs that presumably eliminate effective protein production from the targeted gene by promoting nonsense-mediated decay (NMD) of the encoded mRNA (Fig. 1a; Supplementary Table 1). HAP1 cells harbor a single copy of each chromosome thus reducing the challenges frequently associated with achieving homozygosity in diploid cells for genetic studies10. To confirm the effects of the INDEL on-target gene expression, we used two antibodies each recognizing a different epitope within the targeted protein (Fig. 1b; Supplementary Table 2). We observed in some cell lines the anticipated loss of protein presumably due to the introduced INDEL but in other instances the appearance of novel proteins detectable by western blot analysis using a single or both antibodies (4/13 cell lines or ~30%; Fig. 1b). For example, in the case of the TOP1, SIRT1, CTNNB1, and LRP6 knockout cell lines, we observed the substitution of the canonical protein for a faster migrating novel protein detected by western blot analysis.

Fig. 1 Unanticipated gene expression outcomes following on-target CRISPR editing. a The effect of CRISPR-introduced frameshift alterations on mRNA and protein expression was analyzed using a panel of CRISPR-Cas9-edited HAP1 cells that were commercially accessible. The targeted exon, anticipated PTC location following insertion/deletion mutation and the protein recognition sites of antibodies used in panel b are indicated. b Appearance of novel proteins in cells edited with CRISPR-Cas9. HAP1 cells were subjected to western blot analysis using two distinct antibodies. Asterisks (*) indicate novel proteins. c CRISPR-Cas9 gene editing induces expression of novel mRNA species. RT-PCR analysis of edited cells was performed using primers recognizing flanking exons and the amplicons generated were sequenced. Asterisks (*) indicate novel mRNA species. Source data are provided as a Source Data file Full size image

Given our inability to account for the emergence of these novel proteins based on the annotated genetic alteration introduced by CRISPR-Cas9, we next examined the effects of the INDEL on mRNA splicing given that exonic sequences harbor splicing regulatory elements8,11,12 (Fig. 1c; Supplementary Table 3). In the case of the TOP1 knockout cell line where we had observed the appearance of a novel TOP1 protein, we also witnessed the emergence of a novel mRNA species. Sequencing a cDNA-derived amplicon from the novel splice variant revealed the absence of the INDEL-containing exon suggesting the mutant protein was generated by an INDEL-induced exon exclusion event (Supplementary Data 1). In addition to the use of two different antibodies to evaluate TOP1 protein in the CRISPR-edited cell line (Fig. 1b), we also observed enrichment of both the wt and truncated TOP1 protein in the nucleus where the protein is predominantly localized13 (Fig. 2a). The truncated TOP1 protein nevertheless retained catalytic activity as measured using an enzymatic assay for monitoring relaxation of supercoiled DNA (Fig. 2b). The retention of catalytic activity by the truncated TOP1 protein is consistent with the designation of TOP1 as an essential gene in HAP1 cells from a gene trap mutagenesis screen that would preclude its elimination in viable cells10,14. In the case of the VPS35 and TLE3 cell lines, we observed changes in the splice variants harboring the CRISPR-targeted exons although no detectable novel proteins emerged (Fig. 1c).

Fig. 2 A TOP1 gene harboring a frameshift-inducing deletion retains catalytic activity. a Exclusion of exon 6 (a symmetric exon) produces an internally truncated TOP1 protein (TOP1 ΔE6) with altered subcellular distribution. b The TOP1 ΔE6 protein can induce relaxation of supercoiled DNA. Camptothecin (TOP1 inhibitor) prevents DNA relaxation. c Summary of novel mRNAs or proteins observed in 13 CRISPR-edited commercial HAP1 cell lines (Horizon Discovery). Source data are provided as a Source Data file Full size image

In contrast to the TOP1 clones, the CTNNB1 and LRP6 cell lines exhibited no detectable change in mRNA splicing associated with the targeted exons suggesting the novel proteins are a consequence of alternative translation initiation (ATI) events presumably induced by the introduced INDELs (Fig. 1c). Consistent with this hypothesis, the mutant LRP6 protein is not glycosylated perhaps as a consequence of default expression in the cytoplasm in the absence of its N-terminal signal sequence (Supplementary Fig. 1A, C). Similarly, the novel β-catenin protein co-migrates on SDS-PAGE with an engineered β-catenin protein initiating from Met88 (Supplementary Fig. 1B). Similar events have previously been reported in transcripts with PTCs introduced proximal to the native initiation site in cancerous cells15. In summary, in ~50% of CRISPR-edited cell lines acquired from a commercial source, we observed unexpected changes in protein expression or mRNA splicing that challenge the notion that these reagents could be used to report the cellular effects of complete genetic ablation (Fig. 2c). Although not investigated here, conceivably the mutant proteins could also contribute to neomorphic cellular phenotypes.

ATI and pseudo-mRNAs confound CRISPR-based gene knockout

We had complemented our efforts to generate cells genetically null for various genes-of-interest with de novo CRISPR-Cas9-based gene targeting projects. As part of our focus on the tumor suppressor kinase LKB1, we observed the emergence of unexpected protein products—both smaller and larger proteins than the canonical protein—that were not readily explained by the presence of CRISPR-introduced INDELs (Fig. 3a–c). Given the INDELs created in LKB1 are localized to the first protein coding exon (Fig. 3d) and the antibody recognizing the C- but not the N-terminus epitope reported the shortened LKB1 protein on SDS-PAGE (Fig. 3b, c), we concluded that an ATI event induced by CRISPR-Cas9-introduced INDELs likely resulted in an LKB1 protein lacking a portion of its N-terminal sequence (ATI LKB1 protein).

Fig. 3 ATI and pseudo-mRNAs contribute to foreign protein production in CRISPR-edited cell line. a Genomic structure of the LKB1 gene and the exonic sequence targeted by the LKB1 exon 1 sgRNA. b Emergence of a small LKB1 protein (ATI LKB1) as a consequence of CRISPR-Cas9 gene editing. Lysates generated from CRISPR-edited HAP1 clones were subjected to western blot analysis using two distinct LBK1 antibodies recognizing either N- and C-terminus localized epitopes. c Western blot analysis of CRISPR-Cas9-edited MIA clones reveals the appearance of a large LKB1 protein (Super LKB1) in addition to the ATI LKB1 protein. d Genomic sequences of CRISPR-Cas9-edited HAP1 and MIA clones reveal on-target insertion/deletion mutations in the LKB1 gene. Predicted gene alteration for each clone is indicated. e CRISPR-Cas9-introduced INDELs are associated with the expression of an LKB1 pseudo-mRNA transcript. RT-PCR analysis was performed using primers mapping to 5′ UTR and exon 4 in LKB1 to generate amplicons from the cDNA of CRISPR-Cas9-edited clones. MIA clones M2 and M3, which express Super LKB1 protein, harbor an mRNA species that includes an additional exon. The 131 bp additional exon contains canonical splice acceptor and donor sequences. f A cDNA expression strategy for understanding allele-specific CRISPR-introduced INDELs on protein expression provides evidence for ATI. LKB1 and Super LKB1 cDNA expression constructs harboring genomic alterations found in LKB1 of MIA Clone M2 were introduced into HELA cells that lack endogenous LKB1 expression. The 1 bp insertion or 2 bp deletion in the Super LKB1 cDNA result in proteins that co-migrate with the Super LKB1 protein observed in MIA Clone M2. On the other hand, the same mutations in LKB1 cDNA give rise to proteins that co-migrate with the ATI LKB1 protein found in Clone M2, and with the protein that initiates at Met51. Source data are provided as a Source Data file Full size image

We also noted in MIA, but not HAP1 cells, a slower migrating protein recognized by LKB1 antibodies emerged in CRISPR-Cas9-edited clones with frameshift-inducing INDELs (Fig. 3c; Super LKB1 protein). The appearance of Super LKB1 protein coincided with the appearance of a new mRNA splice variant that contained a 131 bp exon not included in the transcript that encodes the canonical LKB1 protein (Fig. 3e). Consistent with this exon belonging to an LKB1 pseudo-mRNA not previously annotated in MIA cells, the addition of cycloheximide (CHX) to disrupt NMD in parental MIA cells resulted in the emergence of an LKB1 splice variant that includes this exon (Supplementary Fig. 2A). Thus, the same INDELs that induced a frameshift in the canonical transcript now removed a PTC from an LKB1 pseudo-mRNA and capacitated it for protein production (Fig. 3e). We noted that HAP1 cells did not transcribe an mRNA containing this exon thus our introduction of INDELs into exon 1 did not result in the production of the Super LKB1 protein (Supplementary Fig. 2B). An understanding of both the transcriptome and the pseudo-transcriptome in cells is thus critical to anticipating the net effect of frameshift-inducing INDELs introduced by CRISPR-Cas9 (ref. 9).

To understand how CRISPR-Cas9-introduced INDELs may have produced the ATI LKB1 protein, we generated cDNAs harboring each of the two INDELs that were found in the edited cells expressing these proteins (MIA cells, clone M2) in order to remove any potential contribution of altered mRNA splicing to the production of the mutant proteins (Fig. 3f). When either INDEL was introduced into the canonical LKB1 cDNA sequence, we observed the expression of a protein that co-migrated with the ATI LKB1 protein. This unexpected protein product also co-migrated with an engineered protein that initiates at methionine 51 (Fig. 3f). We noted that a cDNA harboring the 1 bp insertion that provoked the ATI LKB1 protein likely did not induce leaky scanning (Supplementary Fig. 3), or translational re-initiation16 given the PTC is located downstream (3′) of the predicted ATI site. We also considered whether or not alternative secondary structures of the mutant mRNAs might induce this ATI at methionine 51 using an algorithm for modeling conserved RNA structures (Supplementary Figs. 4–6). At least using this approach, we anticipate changes in RNA folding that may influence the location of ribosomal initiation. At the same time, we also evaluated the effects of these mutations on cDNAs that encode the predicted pseudo-mRNA sequence (with the 131 bp additional exon). As anticipated, we observed the emergence of proteins that co-migrated with Super LKB1 protein given that either CRISPR-introduced mutation in a transcript with the additional 131 bp sequence would eradicate the naturally occurring PTC present in the pseudo-mRNA sequence (Fig. 3f).

ATI suppresses NMD

Despite the introduction of a frameshift-promoting INDEL in LKB1, we presumed that an ATI event, which restores codon usage to its native phase, would fail to elicit NMD during the pioneer round of translation. At the same time, having avoided destruction, the mutant mRNA is now able to support repeated rounds of translation including presumably short polypeptides initiating at the canonical start site and ending at the PTC. Given our initial western blot analysis of the LKB1 CRISPR-edited clones did not capture low molecular proteins (Fig. 3b, c), re-examination of LKB1 proteins in our CRISPR-edited clones indeed revealed the presence of a small LKB1 polypeptide. This protein (short LKB1) co-migrates with an engineered protein that initiates at the canonical start site but terminates at the presumed PTC introduced by the INDEL (Supplementary Fig. 7A).

We compared the effects on mRNA stability of an INDEL associated with ATI with an INDEL that yielded no detectable LKB1 polypeptides (Supplementary Fig. 7B C) in order to determine if ATI suppressed NMD as a potential mechanism for promoting C-terminally truncated proteins. Comparing the levels of the two LKB1 mRNAs, we observed greater loss of the mRNA in the CRISPR-edited clone lacking any detectable ATI events (Supplementary Fig. 7D). We observed little difference induced by CHX exposure in LKB1 mRNA abundance in the ATI-associated cells when compared to parental cells suggesting that NMD is not acting on the mRNA with an ATI-provoking mutation (Supplementary Fig. 7D). On the other hand, in the case of the CRISPR-edited cell line that expresses no LKB1 polypeptides, we observed a 10-fold change in LKB1 mRNA in the presence of CHX suggesting the mutant mRNA in this case is subject to robust NMD action (Supplementary Fig. 7D). In total, we observed the production of three polypeptides in lieu of the canonical LKB1 protein following the introduction of a frameshift-inducing INDEL: Super LKB1, ATI LKB1, and Short LKB1 (Supplementary Fig. 7E). More generally, our observations also suggest that introducing INDELs early in the transcript increases the potential for an ATI event that is able to clear off all of the splice junction complexes during the pioneer round thus enabling the synthesis of polypeptides with truncations in the C-terminal sequence.

Exon symmetry influences CRISPR outcomes

In the analysis of our assembled HAP1 cell line panel, we also observed ~30% of the clones exhibited exclusion of the targeted exon in the mRNA. Exons are replete with splicing regulatory motifs including exon splicing enhancers and suppressors (ESEs and ESSs, respectively). These degenerate hexameric sequences dictate the extent to which exons are included within a transcript12,17. We suspected that exon exclusion was at least in part due to the disruption of ESEs by an INDEL event. As part of our efforts focused on studying the SUFU tumor suppressor protein, we had generated a collection of cells that presumably were null for SUFU based on western blot analysis (Fig. 4a, b). Yet, we noted that many of these clones exhibited exclusion of the targeted exon (Fig. 4c). The extent of exon exclusion notably differs suggesting other factors, perhaps RNA structure changes that contribute to exon splicing regulation, also may be compromised by the introduction of an INDEL at this position within the SUFU mRNA. We identified a cluster of potential ESEs in the targeted SUFU exon that was likely impacted by the INDEL in these clones (Fig. 4d). No ESSs were identified in this case. To determine how reliably we can induce exon exclusion by impacting a predicted ESE, we introduced INDELs at putative ESEs found in other SUFU exons and performed similar analysis of the protein and mRNA in RMS13 cell line (Fig. 4e–l). In every instance, we observed exon exclusion by targeted disruption of a putative ESE.

Fig. 4 Compromised ESEs account for INDEL-induced exon skipping. a Genomic structure of SUFU and exonic sequence targeted by SUFU exon 8 sgRNA. The recognition sites of antibodies used in panel b are indicated. b Western blot analysis of HAP1 cells edited with SUFU exon 8 sgRNA shows no detectable expression of SUFU. c Exon skipping is prevalent in CRISPR-Cas9-edited SUFU clones. RT-PCR analysis using primers flanking exons 6 and 10 of SUFU in CRISPR-Cas9-edited SUFU clones. Sequencing of amplicons reveals exon skipping in all of clones except clones H9 and H10. d Disruption of exon splicing enhancers (ESEs) by CRISPR-introduced INDELs triggers skipping of the edited exons. Genetic mutation and the presence/absence of exon skipping events for each clone are indicated. Putative ESEs were identified using the RESCUE-ESE web server. e Multiple sgRNA sequences located in symmetric or asymmetric exons of the SUFU gene used for targeted disruption of ESEs. f sgRNAs described in “e” were used to edit the SUFU gene in RMS13 cells. Western blot analysis of lysates derived from the CRISPR-Cas9-edited RMS13 clones show no detectable SUFU protein. g Genomic sequences of RMS13 clones edited with SUFU exon 3 sgRNAs. CRISPR-introduced mutations and putative exon splicing enhancer (ESE) and exon splicing silencer (ESS) sequences are indicated. h RT-PCR analysis and cDNA sequencing result of clones R2 and R3 using primers flanking exon 1 and 5. i Genomic sequences of RMS13 clones edited with SUFU exon 2 sgRNA. CRISPR-introduced mutations and putative ESE/ESS sequences are indicated. j RT-PCR analysis and cDNA sequencing result of clones R1 and R4 using primers flanking 5′ UTR and exon 4. k Genomic sequences of RMS13 clones edited with SUFU exon 8 sgRNA. CRISPR-introduced mutations and putative ESE and ESS sequences are indicated. l RT-PCR analysis and cDNA sequencing result of the clones R1 and R4 using primers flanking exon 6 and exon 10. m Disruption of ESE code is highly reliable in anticipating CRISPR-Cas9-induced exon skipping. Twenty-four CRISPR-Cas9-edited cell lines with different mutations were analyzed for the presence/absence of exon skipping events and changes in ESE sequences due to CRISPR-introduced INDELs. Source data are provided as a Source Data file Full size image

When all the clones presented so far from both commercial and de novo engineered were considered with respect to predicted impact on an ESE and exon exclusion, we observed a strong correlation between these two events (Fig. 4m; Supplementary Fig. 8). A subset of the clones exhibiting alternative splicing also expressed novel polypeptides (see TOP1 and SIRT1; Fig. 1b). We noted in both these cases that the exons were symmetric—meaning the exon harbors a nucleotide number in multiples of three, and exclusion of this exon would result in a transcript that retains the original reading frame. In the case of the SUFU clones, the majority of exons skipped were asymmetric thus likely resulting in the lack of protein expression. However, we noted one targeted and skipped exon (exon 2) was symmetric yet the resulting transcript failed to generate a detectable protein perhaps due to misfolding of the mutant protein (Fig. 4e, f). Indeed, the skipped exon encodes part of an intrinsically disordered region of the protein that is essential for interaction with members of the pro-survival BCL2 family members18. From these SUFU clones, we expect that decreased SUFU mRNA seen in CRISPR-edited cells was due to NMD provoked by the introduction of a frameshift-inducing INDEL, or exclusion of the targeted asymmetric exon and the introduction of a PTC in an NMD-enabling position within the gene.

CRISPinatoR

Purposeful disruption of ESEs in asymmetric exons could improve gene knockout efficiency given that even INDELs that fail to alter the coding frame would have a second opportunity for introducing a PTC by skipping the exon altogether. In addition to the evidence provided here, the ability of mutations in ESEs to alter mRNA splicing have been documented elsewhere19,20. To systematize this strategy, we developed the CRISPinatoR, a website that identifies asymmetric exons found in a given gene and CRISPR-Cas9 guide sequences that help to deliver double-stranded breaks within proximity of a putative ESE (Fig. 5a, Supplementary Fig. 9). At the same time, the portal could be used to induce the skipping of an exon harboring a deleterious mutation in order to generate a novel protein that may retain function. We note that when analyzing genome-wide CRISPR libraries, that ratio of guides targeting symmetric and asymmetric exons was fairly consistent, suggesting that these algorithms do not factor in potential gene elimination efficiency based on exon symmetry (Supplementary Fig. 10A, B). Similarly, the CRISPinatoR could be used to re-evaluate previously reported phenotypes using CRISPR-Cas9 based on the potential for the sgRNA for inducing exon skipping.

Fig. 5 Targeting RNA-regulatory elements for gene knockout agendas. a CRISPinatoR: a web-based guide RNA design tool that utilizes targeted ESE disruption for achieving gene elimination. CRISPinatoR identifies sgRNA sequences that target ESEs in asymmetric exons and calculates off-targeting potential and the number of splice variants impacted by the sgRNAs. A scoring system that integrates all three parameters is used to provide sgRNAs with high gene knockout potential. b Genome structure of the LRP5 gene and sgRNA sequences targeting the asymmetric exon 2 and the symmetric exon 16. c Genomic sequencing results of HAP1 clones edited using LRP5 exon 2 and exon 16 sgRNAs. CRISPR-introduced mutations and the putative ESE sequences are indicated. d Exclusion of an asymmetric or a symmetric exon with INDEL-induced changes to the putative ESE sequences. RT-PCR analysis and cDNA sequencing result of HAP1 cells edited with LRP5 exon 2 and exon 16 sgRNAs. e Targeted ESE disruption in asymmetric exon increases gene knockout potential. Western blot analysis of HAP1 clones edited with LRP5 exon 2 sgRNA (Clone 21) and exon 16 sgRNA (Clone 3) was probed with two distinct antibodies indicated in “b”. ESE disruption in symmetric exon 2 produces internally truncated in-frame LRP5 protein. f The internally truncated LRP5 protein is glycosylated. Lysates derived from WT or LRP5 ΔE16 HAP1 cells were incubated with the deglycosidase PNGase F then subjected to western blot analysis. g Exclusion of LRP5 exon16 would delete a sequence adjacent to the WNT3A binding domains. h The LRP5 ΔE16 protein formed post skipping of a symmetric exon is functionally active. WNT/β-catenin pathway activity in response to WNT3A conditioned medium (WNT3A CM) was measured for HAP1 WT, LRP5 ΔE2, and LRP5 ΔE16 cells. WNT pathway inhibitors WNT974 (PORCNi) and IWR1 (TNKSi) serve as negative and positive control, respectively. All error bars represent mean of triplicates ± s.d. The experiment was repeated three times with similar results. Statistical testing was performed using Student’s t-test, *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001. Source data are provided as a Source Data file Full size image

Targeting RNA-regulatory elements for gene knockout agendas

We tested the ability of CRISPinatoR to design guides that induce exon skipping for either degradation of mRNA or production of novel protein-encoding mRNAs by targeting asymmetric or symmetric exons, respectively. Using the WNT receptor LRP5 as a case study, we asked the CRISPinatoR to identify sgRNAs that presumably would be able to induce exon skipping in each exon class (Fig. 5b). We identified clones that harbored INDELs at the anticipated LRP5 exonic sequence by targeted sequencing of isolated genomic DNA (Fig. 5c). Using RT-PCR analysis coupled with targeted sequencing, we observed exon skipping in clones associated with both guides (Fig. 5d; Supplementary Fig. 11). We observed an absence of LRP5 protein in the clone exhibiting exclusion of an asymmetric exon (Fig. 5e). However, in the clone exhibiting exclusion of a symmetric exon, we observed the appearance of a faster migrating protein (Fig. 5e). We confirmed that this new protein retains glycosyl moieties, suggesting that its signal sequence localized to the N-terminus is intact unlike in the case of the LRP6 edited HAP1 clone (Fig. 5f; Supplementary Fig. 1A, C). The presence of a secreted protein and evidence for skipping of the CRISPR-targeted exon suggest that the novel LRP5 protein formed would harbor a compromised β-propeller domain—one of two that contributes to WNT3A binding (Fig. 5g). Indeed, we observed response of a clone expressing the truncated LRP5 protein to exogenously supplied WNT conditioned medium using a WNT pathway reporter (Fig. 5h). The weakened response compared to WT HAP1 cells likely reflects reduced total LRP5 protein levels and/or reduced WNT-binding affinity with deletion of exon 16 sequence. On the other hand, the cell expressing the LRP5 mRNA excluding the CRISPR-edited asymmetric exon showed a loss of WNT pathway response consistent with the absence of LRP5 protein production from an mRNA lacking an asymmetric exon (Fig. 5h).