Asymmetric deletions using single sgRNAs

CRISPR/Cas9 gene editing has been used successfully to target the mouse genome (Supplementary Table 1). To obtain in-depth knowledge on the extent of molecular consequences at target sites, we analysed deletions introduced at 17 loci in the mouse genome. By directly injecting the Cas9 machinery into mouse zygotes, we targeted five enhancers bound by the cytokine-sensing TF STAT5 in three genomic loci, Stat5a (A)20, Socs2 (B)26 and Wap (C)19 (Fig. 1). We also targeted 10 sites bound by CCCTC-binding factor (CTCF), a DNA-binding protein proposed to participate in generating chromatin loops27, in the Wap (D) and Csn (E) loci (DOI: 10.1093/nar/gkx185; Supplementary Fig. 1). Lastly, we targeted one enhancer bound by STAT5 and NFIB, a TF involved in epithelial cell differentiation28,29,30 in the Csn (F) locus (Supplementary Fig. 1). These genomic sites were targeted individually or in combination. Juxtaposed sites within a given locus were targeted either simultaneously or successively, that is, in a one-step or two-step procedure (Fig. 1, Supplementary Notes 1–3 and Supplementary Table 2). Four distinct targeting strategies were pursued (detailed diagrams shown in Figs 2, 3, 4). We targeted individual TF-binding sites with only one corresponding sgRNA (Type 1), targeting individual TF-binding sites with more than one sgRNA (Type 2) and more than one TF-binding site with several sgRNAs (Type 3). In addition to targeting more than one site simultaneously (one-step, Type 3), we also targeted them successively (two-steps, Type 4). To accomplish this, we first targeted specific TF-binding sites and generated homozygous mutant mice, which subsequently served as hosts for targeting additional sites in the same gene locus. Molecular consequences of targeting events at 17 sites were investigated in more than 630 founder mice and 54 established lines, using both polymerase chain reaction (PCR) and DNA sequencing. The number of founders from each targeted site is shown in Supplementary Table 2. Deleted sequences from each individual founder are shown in Supplementary Notes 1–3. Each founder is the result of a distinct deletion induced by a given targeting event.

Figure 1: Targeting 17 sites in the mouse genome with CRISPR/Cas9. STAT5 TF-binding sites (GAS motif), CTCF-binding regions and an NFIB-binding site were targeted for deletion. STAT5-binding sites in three gene loci (A, Stat5 (ref. 20); B, Socs2 (ref. 26); C, Wap19) were targeted individually (A, B and C-1). The three STAT5-binding sites in the Wap super-enhancer19 were deleted in four different combinations (C-1/2, C-1/3, C2/3 and C-1/2/3). Combined deletion of two or more STAT5-binding sites within the same gene locus was accomplished through a successive (two-steps) or simultaneous (one-step) targeting. A total of 11 CTCF-binding sites in two gene loci (D, Wap; E, Csn) were targeted individually and in different combinations. (F), an NFIB-binding site in the Csn locus was targeted (Supplementary Fig. 1). Target mutations were identified in 632 founder mice and 54 mutant lines were established. The positions of reference genes in the respective loci are indicated as coloured boxes. A, Stat5a; B, Socs2; C and D, Wap; E, Csn1s1; F, Csn3. Full size image

Figure 2: Targeting individual genomic sites with corresponding single sgRNAs. Two genomic sites (B and C-1) were targeted independently with two individual sgRNAs (1 and 2). Four individual genomic sites (D-2, D-3, D-4(1) and F) were targeted with one sgRNA each. The deletion of two juxtaposed genomic sites (C-2/3) was generated in two steps. A single sgRNA-targeting site C-2 was injected into zygotes from mice carrying already a deletion in site C-3. The deletion of site C-3 was generated by TALEN19. B, Socs2; C and D, Wap; E, Csn1s1; F, Csn3. The red numbers refer to the sites being targeted in the respective experiments. Full size image

Figure 3: Targeting individual genomic sites with more than one sgRNA. Each of seven individual genomic sites (A, D-4(2), E-1, E-2, E-3, E-4 and E-5) was simultaneously targeted with two or three sgRNAs. Combined deletions of more than one genomics site (C-1/3 and D-1/2/3/4) were accomplished in two steps. Two sgRNAs targeting site C-1 were simultaneously injected into zygotes from mice carrying already a deletion in site C-3 (ref. 19). Two sgRNAs targeting site D-2 were simultaneously injected into mice carrying already a deletion in sites C-1/3/4. A, Stat5; C and D, Wap; E, Csn1s1. The red numbers refer to the sites being targeted in the respective experiments. Full size image

Figure 4: Targeting more than one genomic site with several sgRNAs. Three juxtaposed genomic sites (D-1/3/4) were targeted simultaneously with four sgRNAs. Combined deletion of more than one genomic site (C-1/2, C-1/2/3 and D-1/2/3/4/5) was accomplished in two steps. Four sgRNAs targeting sites C-2 and C-3 were simultaneously injected into zygotes from mice carrying already a deletion in site C-1 (ref. 19). Four sgRNAs targeting sites D-2 and D-5, which are 23 kb apart from each other, were simultaneously injected into mice carrying already deletions in sites C-1/3/4. C and D, Wap. The red numbers refer to the sites being targeted in the respective experiments. Full size image

Cas9 nuclease recognizes the PAM, typically the NGG sequence adjacent to sgRNA in target DNA, and induces a double-strand break between the third and fourth nucleotides from PAM31, but the orientation of deletions has not been reported. We examined the possibility of preferential orientations and distinguished between symmetric and asymmetric deletions (Fig. 5a). Only deletions obtained from injections of single sgRNAs were analysed, thereby avoiding effects of multiple variables (Fig. 2 and Supplementary Note 1). Deletions upstream of the Cas9-cutting site that were equal or less than 1.5-fold compared to downstream ones were defined as symmetric. Deletions exceeding 1.5-fold at either end were considered asymmetric. More than 80% of the deletions detected in 139 founders obtained from targeting nine different sites were asymmetric and extended in either direction of the Cas9-cutting site (Fig. 5b). Notably, more than 70% of the deletions exceeded a two-fold difference. Asymmetric deletions were prevalent in founders from all nine genomic sites (from 50 to 100% of frequency), suggesting that this result was not linked to specific sgRNA sequences (Fig. 5c). Symmetric deletions were preferentially observed in small deletions of less than 10 bp (Fig. 5d). From all asymmetric deletions, 59% extended towards the 5′ end and 41% towards the 3′ end of the sgRNA (Fig. 5e,f). However, this was not statistically significant (P=0.6). The 18 previously published studies studies that have targeted individual loci in the mouse genome with single sgRNAs have not specifically addressed the symmetry of deletions10,11,17,18,31,32,33,34,35,36,37,38,39,40,41,42,43,44. Deleted sequences were available from seven studies11,18,31,34,39,43,44 (Supplementary Data 1), but only one18 showed large enough data sets to permit a direct comparison with ours. Kim and colleagues18 targeted two genomic sites based on 84 founders and with a cutoff of two-fold, 82% of the deletions were asymmetric compared to 73% in our study with 139 mice representing nine genomic sites (Supplementary Fig. 2a). With a cutoff of 1.5-fold, 89% of the deletions were asymmetric compared to 82% in our study. While the frequency of asymmetric deletions was similar, the maximum deletion size was 585 bp in our study compared to 269 bp and the deletion sizes of top 50% are bigger in our study (Supplementary Fig. 2b).

Figure 5: Asymmetric deletions. (a) Schematic diagram of symmetric and asymmetric deletions detected in mice targeted with a single sgRNA. Red triangle, Cas9-cutting site three base pairs upstream of the PAM sequence. Symmetric deletions were defined as those with an equal or less than 1.5-fold ratio between the upstream and downstream Cas9-cutting site. In asymmetric deletions, the difference at either site was more than 1.5-fold than at the other site. (b) Percentage of symmetric and asymmetric deletions identified in CRISPR/Cas9-targeted mice (n=139). More than 80% of deletions were asymmetric and more than 70% of the deletions exceeded a two-fold difference. Only deletions obtained from a single sgRNA injection were analysed to avoid the effect of multiple variables. Deletions obtained from a single sgRNA injection: deletions targeting TF-binding site B, C-1, C-2/3, D-2, D-3, D-4(1) and F. (c) Ratio of symmetric and asymmetric deletions obtained with each sgRNA. B #1, n=8; B #2, n=18; C-1 #1, n=10; C-1 #2, n=8; C-2/3, n=12, D-2, n=40; D-3, n=12; D-4(1), n=21; F, n=10. Asterisk (*), sgRNAs with identical deletions identified in more than one half of the founders. (d) Frequency of symmetric deletions obtained with different deletion sizes. (e) Representative examples of deletions towards the 5′ end and 3′ end of sgRNA. If the deletion at the upstream Cas9-cutting site was longer (≥1.5-fold) than that at the downstream one, it was defined as a 5′ deletion and vice versa. (f) Percentage of deletions towards the 5′ end and 3′ end of sgRNA. Full size image

Deletions preferentially occur at repeat sequences

We had noticed that with any given sgRNA 45% of the mutant founders carried apparently identical deletions, although mutations are supposed to be independent from each other. On detailed examination of such prevalent deletions, we determined that they frequently occurred at repeat sequences in targeted regions (Fig. 6a). Notably, single or duplicated units of repeat sequences were retained at the deletion site. More than 60% of mutant founders from one specific genomic site (D-4), a CTCF-binding site in the Wap locus, carried the exact same deletion with the repeat sequences aligned at one end (Fig. 6b, upper panel). Over 30% of mutant founders from another genomic site (C-2/3), two STAT5-binding sites within the Wap super-enhancer19, had the same repeat sequence aligned at both ends (Fig. 6b, bottom panel). Notably, 80% of deletions derived from one sgRNA (C-2/3) occurred at repeat sequences (Supplementary Fig. 3). Among the entire cohort of founders (56) carrying deletions at repeat sequences, 65% had deletions within a single copy of repeat sequences and 35% of founders had the duplication of repeat sequences (Fig. 6c). We exclusively analysed deletions obtained on targeting the genome with single sgRNAs (Fig. 2 and Supplementary Note 1). Deletions with repeat sequences aligned at one end were probably due to microhomology-mediated repair as previously reported in vitro45,46,47,48,49,50 and in vivo40,51,52,53,54,55,56 (Supplementary Table 1). However, to date the frequency of such deletion patterns had not been examined systematically in a large cohort and deletions with repeat sequences aligned at both ends had never been reported. Although the molecular mechanism that generates deletions with repeat sequences at both ends is unclear, microhomology-mediated end joining may facilitate the deletion with the repeat sequences aligned at one end57 (Supplementary Fig. 4). These results indicate that CRISPR-based deletions do not simply occur randomly, but with preferential patterns.

Figure 6: Preferential deletions at repeat sequences. (a) Average frequency of deletions found at repeat sequences by single sgRNA injections. Repeat sequences were aligned in ∼45% at one end or both ends (total n=122; deletion at repeat sequences, n=56). Only deletions obtained from injections with single sgRNAs were analysed to avoid effects of multiple variables. Deletions obtained from single sgRNA injections: Deletions targeting TF-binding site B, C-1, C-2/3, D-2, D-3, D-4(1) and F. (b) Representative examples of repeat sequences aligned at one end (upper panel) and both ends (bottom panel). More than 60% of mutant founder mice targeting the genomic site D-4 exhibited the exact same deletion that retained only a single copy of repeat sequence. More than 30% of mutant founder mice targeting the genomic site C-2/3 showed the exact same deletion and repeat sequences remained at both ends. (c) Percentage of repeat sequences aligned at one end and both ends in founder mice carrying deletions at repeat sequences. Full size image

Large deletions created by single sgRNAs in zygotes

Zhou et al.11 have reported that injections of dual adjacent sgRNAs at a given site not only improved the deletion efficiency, but also increased the deletion size in nine founder mice. To further assess whether large deletions are also obtained on targeting single sites, we investigated the extent of deletions obtained on injection of one or more sgRNAs corresponding to a single genomic site (Fig. 7a). Although the median deletion size obtained with single sgRNAs (9 bp) (Fig. 2 and Supplementary Note 1) was shorter than that gained with more than one adjacent sgRNAs (84 bp) (Fig. 3 and Supplementary Note 2), we also observed large deletions of up to 600 bp with single sgRNAs (Fig. 7b). Notably, in one experiment the majority of founders (83%) exceeded the average deletion size (49 bp) and ∼40% harboured deletions over 200 bp (Supplementary Fig. 5). The deletion sizes generated by individual sgRNAs or more than one sgRNAs were independent of the guanine-cytosine (GC) content of the sgRNA or the distances between sgRNAs (Supplementary Fig. 6).

Figure 7: Large deletions obtained with single sgRNAs. (a) Diagram of single and multiple sgRNAs targeting specific sites. Targeted sites are shown in purple and sgRNAs are indicated as cyan arrows. (b) Comparison of deletion sizes generated by single sgRNAs and multiple sgRNAs injected into mouse zygotes (total number of founder mice, n=243; single sgRNA injection at a single site, n=122; multiple sgRNA injections at a single site, n=121). Single sgRNA injection at a single site, deletions obtained by targeting TF-binding site B, C-1, C-2/3, D-2, D-3, D-4(1) and F; multiple sgRNA injection at a single site, deletions obtained by targeting TF-binding site A, C-1/3, D-4(2), D-1/2/3/4, E-1, E-2, E-3, E-4 and E-5. The median deletion size obtained with single sgRNAs (9 bp) was smaller than that with multiple sgRNAs (84 bp). The deletion size generated with a single sgRNA was up to 600 bp. Median, middle bar inside the box; IQR, 50% of the data; whiskers, 1.5 times the IQR. Full size image

Sequential versus simultaneous deletion of adjacent sequences

Large deletions have been reported on simultaneously targeting more than one adjacent sites7,9,13,14 (Supplementary Table 1). However, it is not clear if the deletion of juxtaposed sites can be achieved efficiently through the co-injection of the respective sgRNAs or whether a sequential deletion approach would be more robust. We addressed these questions and compared the deletion patterns obtained from mutant mice generated by simultaneous (one-step) or sequential (two-step) injection of sgRNAs covering seven sites (Fig. 8a, Fig. 4 and Supplementary Note 3). While the range of short deletions (<400 bp) obtained with both strategies was not significantly different (average deletion size of ∼8 bp per site and ∼39 bp per site), large deletions (>400 bp) of up to 24 kb were only obtained by co-targeting loci (Fig. 8b). Strikingly, among 45 founders obtained on simultaneously injecting sgRNAs for two sites more than 50% of the deletions were classified as large (Fig. 8c). Importantly, we initially failed to identify these large deletions due to the PCR screening strategy, which typically amplifies short fragments (∼400 bp) spanning individual sites. These large deletions were only detected using serial PCR primers spanning the entire target region.

Figure 8: Deletion sizes obtained on sequentially (two-steps) and simultaneously (one-step) targeting the mouse genome. (a) Schematic diagram of deleting several sites in a single gene locus using sequential or simultaneous sgRNA targeting. The targeted site is shown in purple and sgRNAs are indicated as cyan arrows. (b) Comparison of deletion sizes obtained from sequential and simultaneous sgRNA-targeting approaches. Sequential sgRNA injections, deletions obtained by targeting TF-binding site C-2/3; simultaneous sgRNA injections, deletions obtained by targeting TF-binding sites C-1/2, C-1/2/3, D-1/3/4 and D-1/2/3/4/5. Deletions smaller than 400 bp and identified by typical PCR genotyping method were called ‘short deletion’. Those over 400 bp were considered ‘large deletions’. Results are shown as the mean (total number of founder mice, n=65; sequential deletion, n=20; simultaneous deletion, n=45). (c) The percentage of short and large deletions obtained from mutant mice generated by simultaneous injection with more than one sgRNA. Full size image

Through in-depth sequence analysis, we identified two distinct deletion patterns, the ‘stitched large deletion’ and the ‘continuous large deletion’ (Supplementary Fig. 7). In one experiment, we observed a combination of short and large deletions (>2 kb) in a 7 kb region, which resulted in the stitched large deletions (Supplementary Fig. 7a). In another experiment, we also observed continuous large deletions over 20 kb in size that removed the entire sequence between sgRNAs. Strikingly, all nine founders from one particular experiment harboured these large deletions (Supplementary Fig. 7b and Supplementary Note 3). Based on the definition of microhomology-based deletions45, we did not observe any microhomology-based large deletions with any given sgRNA injection method. Collectively, our results indicate that although the simultaneous targeting can rapidly generate deletions of multiple sites, it frequently creates large deletions, possibly removing potential regulatory elements. Although time consuming, two-step targeting appears to be the more reliable approach to precisely delete individual sites within a given locus.

Insertions

In addition to deletions, we also observed insertions (Supplementary Notes 1–3 and Supplementary Table 3). The frequency of insertions was 4% on targeting individual genomic sites with single sgRNAs, 10% on targeting individual genomic sites with more than one sgRNAs and 6% on simultaneously targeting more than one genomic site with several sgRNAs (Supplementary Table 3). We have also observed two different types of insertions, insertion combined with deletions (Type A) and insertion only (Type B) (Supplementary Notes 1–3). Although most insertions consisted of only a few nucleotides, we also observed a large insertion of 800 nucleotides comprised of repetitive sequences (Supplementary Note 3a).

Avoiding genotyping pitfalls linked to large deletions

Standard PCR genotyping methods are usually employed to screen for desired mutations in founder mice (F0). Initially, we designed PCR primers to examine short (400–500 bp) genomic regions surrounding targeted sites. However, the presence of large deletions in one allele with the other allele being wild-type can cause misleading PCR results as only the wild-type allele would be amplified. These mice would be incorrectly categorized as wild-type based on their apparent genotype, which failed to detect the mutant allele (Fig. 9a). Similarly, mice carrying deletions on both alleles could be misidentified. A regular PCR strategy would detect the small deletion on one allele but would miss a larger one on the second allele that extends the location of one or both primers. These mice would misleadingly appear to be homozygous and the ‘hidden deletion’ would go unnoticed. Indeed, most of the ‘homozygous’ founders identified in our study were not genuine homozygous mutants but rather compound heterozygotes (Supplementary Note 3). Only the use of PCR spanning the entire targeted loci of up to 30 kb revealed the biallelic complexity of CRISPR/Cas-induced deletions. The presence of biallelic deletions of different sizes at multiple target sites was even more difficult to decipher (Fig. 9b). In our hands, 7 out of 30 founders were initially incorrectly categorized due to the large deletion (Supplementary Table 4). Lastly, when the large deletions were generated in intervening regions between two target sites, they were frequently missed if only the target sites were sequenced (Fig. 9c). Such large deletions could compromise the validity of biological studies and to avoid such problems, alternative genotyping methods, such as genomic qPCR58 or even whole-genome sequencing, might be necessary.

Figure 9: Screening for compound heterozygote deletions that are disguised as homozygote deletions. (a) Diagrams depict misleading genotyping results caused by large deletions at one of the two alleles. Misleading results are shown in red with quotation marks. All possible genotypes generated from single site mutagenesis are shown. (b) Diagrams depict examples of potential misleading genotyping results caused by stitched large deletions and continuous large deletions. Blue and red arrows indicate PCR primers. Misleading results are shown in red with quotation marks. WT, wild-type; het, heterozygous; homo, homozygous; M1: mutated site 1; M2: mutated site 2. (c) Schematic demonstration of a mutant derived from simultaneous injection of two sgRNAs, which carries a 1 kb deletion between two targeting sites in addition to the desired short deletions at the target sites. Full size image

To decode complex genotypes, especially large deletions obtained with two or more sgRNAs targeting juxtaposed sites, we used serial PCRs spanning sequences within loci as well as outside primers spanning entire loci (Supplementary Fig. 7c). Using this strategy, we easily identified deletions of more than 22 kb. In summary, the simultaneous sgRNA injections resulted in complex deletions and the generation of F1 mice is required to decode their exact genomic architecture. Moreover, simultaneous targeting of sites separated by up to 23 kb results in the deletion of the entire region. Thus, to restrict deletions to the desired sites we propose a sequential, two-step, targeting approach.