Significance The clustered regularly interspaced short palindromic repeat (CRISPR)-associated proteins (Cas) have been widely used for genome engineering. However, their off-target activities limit broad application. The small Cas9 ortholog from Staphylococcus aureus (SaCas9) can be packaged in the payload-limited adeno-associated viral (AAV) vector that is commonly used for in vivo gene editing. Nevertheless, there is still a lack of SaCas9 variants conferring high genome-wide specificity. Here, we report a rationally engineered SaCas9 variant (SaCas9-HF) with highly specific genome-wide activity in human cells without compromising on-target efficiency. SaCas9-HF can be delivered by AAV and show higher genome-wide specificity than wild-type SaCas9. Our finding provides an alternative to SaCas9 genome-editing applications requiring exceptional genome-wide precision.

Abstract RNA-guided CRISPR-Cas9 proteins have been widely used for genome editing, but their off-target activities limit broad application. The minimal Cas9 ortholog from Staphylococcus aureus (SaCas9) is commonly used for in vivo genome editing; however, no variant conferring high genome-wide specificity is available. Here, we report rationally engineered SaCas9 variants with highly specific genome-wide activity in human cells without compromising on-target efficiency. One engineered variant, referred to as SaCas9-HF, dramatically improved genome-wide targeting accuracy based on the genome-wide unbiased identification of double-stranded breaks enabled by sequencing (GUIDE-seq) method and targeted deep sequencing analyses. Among 15 tested human endogenous sites with the canonical NNGRRT protospacer adjacent motif (PAM), SaCas9-HF rendered no detectable off-target activities at 9 sites, minimal off-target activities at 6 sites, and comparable on-target efficiencies to those of wild-type SaCas9. Furthermore, among 4 known promiscuous targeting sites, SaCas9-HF profoundly reduced off-target activities compared with wild type. When delivered by an adeno-associated virus vector, SaCas9-HF also showed reduced off-target effects when targeting VEGFA in a human retinal pigmented epithelium cell line compared with wild type. Then, we further altered a previously described variant named KKH-SaCas9 that has a wider PAM recognition range. Similarly, the resulting KKH-HF remarkably reduced off-target activities and increased on- to off-target editing ratios. Our finding provides an alternative to wild-type SaCas9 for genome editing applications requiring exceptional genome-wide precision.

Genome engineering technologies have enabled systematic interrogation of genome function and hold great potential for gene therapy (1⇓⇓–4). The clustered regularly interspaced short palindromic repeat (CRISPR)-associated protein (Cas) system allows for efficient DNA modification when guided by a cRNA and in the presence of a protospacer adjacent motif (PAM). However, imperfect guide RNA–target DNA matching may also induce nuclease activity of Cas proteins, resulting in modifications at genomic loci other than the intended locus (5). This off-target activity could confound research results and constrain clinical utility. Two widely used Cas9 orthologs from Streptococcus pyogenes (SpCas9) and Staphylococcus aureus (SaCas9) have different levels of off-target activity (5⇓⇓–8). SaCas9 is compact and can be packaged in the payload-limited adeno-associated viral (AAV) vector that is commonly used for in vivo gene editing (6). While there are a handful of high-fidelity SpCas9 variants (9⇓⇓⇓⇓–14), no SaCas9 variant with high genome-wide specificity is available.

Two main strategies have been exploited to generate Cas9 variants with improved specificity. One is structure-guided protein engineering to modify amino acid residues in close contact with the target DNA strand or those interacting with the nontarget DNA strand (9⇓–11). The other is through random mutagenesis followed by end-point selection or directed evolution (12⇓–14). Studies employing these strategies mainly focused on SpCas9, except 1 on SaCas9, which generated eSaCas9 variants by modifying amino acid residues interacting with the nontarget DNA strand, leading to reduced activity at 3 predefined off-target sites, but these have unknown genome-wide specificity (10).

Cas9 recognition and binding of its target DNA sequence is a dynamic process that involves sequential conformational changes in functional domains between inactive and active states prior to concerted cleavage of both DNA strands (11, 15⇓–17). Single-molecule Förster resonance energy transfer experiments on SpCas9 showed that the number of mismatched bases in the guide RNA–target DNA heteroduplex in the PAM-distal region was inversely correlated with the proportion of SpCas9 in the activated state (17). Wild-type SpCas9 amino acid residues proximal to the guide-RNA–target DNA interface could lower the threshold for activating the nuclease domain (11). Modification of these residues can raise the activation threshold and lead to a better discrimination between on- and off-target activity (enhanced proofreading) and, thus, improve specificity (11).

SaCas9 is much smaller than SpCas9 (1,053 vs. 1,368 a.a.), yet it still possesses robust nuclease activity (6). Despite sharing only 17% sequence identity with SpCas9, SaCas9 recognizes the PAM-distal region of the guide RNA–target DNA in a similar manner to SpCas9 (18). Based on the enhanced proofreading mechanism of SpCas9 and the molecular dynamic similarity of SpCas9 and SaCas9, we sought to improve the targeting accuracy of SaCas9 by modifying residues in close polar contact with the backbone of the target DNA strand in the PAM-distal region. Using genome-wide unbiased identification of double-stranded breaks enabled by sequencing (GUIDE-seq) (7) and targeted deep sequencing, we showed that 1 engineered variant dramatically reduced off-target cleavages without compromising on-target activity.

Discussion We have engineered a CRISPR Cas9 variant from Staphylococcus aureus (SaCas9-HF) that shows high genome-wide targeting accuracy without compromising on-target efficiency, as validated with rigorous evaluation of its on- and off-target activities across 24 endogenous sites. The results of targeted deep sequencing combined with IGV inspection of InDels in a number of sites down to and below 0.1% provide compelling evidence that the off-target sites we identified using GUIDE-seq are bona fide target sites of WT-SaCas9. Theoretically, every 10 ng of DNA contains only 3 copies of mutant fragments when the mutation rate is 0.1%. Failure to confirm 1 out of 9 sites in FANCF_13 and 1 out of 4 sites in EMX1_6 by targeted deep sequencing might be due to undersampling of DNA fragments when the absolute copy of InDel fragments in the input DNA approaches zero. When InDel% is at the boundary of detection limit of a detection method, some of these off-target sites may not be detected in all experimental replicates. Since GUIDE-seq has a detection limit around 0.1% (7), a generally more sensitive method that can detect a large number of InDels below 0.1% such as CIRCLE-seq (24) would be helpful for ultrasensitive detection of Cas9 off-targets. Because Cas9 activity is cell-type-specific owing in part to genomic locus accessibility and the integrity of double-stranded break repair pathways in particular cell type, the results from the ARPE cells lend support to the hypothesis that SaCas9-HF is highly precise in different cell lines. However, future studies on additional target sites and in additional cell types are needed to confirm comparable on-target efficiency and reduced off-target activity via AAV delivery and in more cell types. SaCas9-HF shares the same mutation R654A with the enhanced specificity S-HF (10). S-HF contains 4 engineered residues that could weaken nonspecific SaCas9-DNA interaction and have shown dramatic activity reduction at 3 off-target sites known a priori. As in the scenario for SpCas9 shown previously, a simple combination of SpCas9-HF and eSpCas9 resulted in greatly impaired SpCas9 activity (11); thus we did not combine those S-HF mutations in our SaCas9-HF. Nonetheless, the R654 residue initially reported by S-HF is located in the RuvC-III domain. Interestingly, we found that both the R654A single mutant and S-HF led to some off-target activities on sites containing noncanonical PAMs. However, those off-target sites were not observed in SaCas9-HF, which shares the R654A mutation, and might be due to improved specificity imposed by 3 other mutations specific to SaCas9-HF. Similarly as Hypa-Cas9 (11), the best-performing triple mutant (R245A/N413A/N417A) has all engineered sites located in the recognition lobe domain of the Cas9 protein. Reporter assays based on fluorescent protein gene disruption revealed that the WT-SaCas9 recognizes a NNGRRT PAM, with the third PAM position nucleotide showing low-level targetable T (6) or A (nearly 20% of G) (8) nucleotides, whereas another reporter assay showed a strict requirement for G at the third position and a complete absence of SaCas9 activity on non-G nucleotides (25). In contrast, in the human endogenous NNARRT PAM site we tested, SaCas9 can induce a fair level of cleavage (12–22% InDel). The improved specificity of SaCas9-HF variants pertains to engineered KKH-HF over KKH-SaCas9, which has a broader PAM recognition range (NNNRRT) (8). However, a simple combination of high-fidelity mutations with PAM-broadening mutations might lead to “overengineering” as we observed occasionally reduced on-target activity of KKH-HF. Our results indicate that SaCas9-HF has the same tolerance for spacer length and similar restrictiveness on a 5′ starting mismatched G sgRNA as WT-SaCas9. Future studies employing combinatorial approaches to screen for a large number of protein mutations en masse, such as the CombiSEAL (26), would facilitate the development of SaCas9 variants with desired features.

Materials and Methods Genome-wide off-targets of Cas9 editing were identified using GUIDE-seq (7) with minor modifications, including a redesign of the original half-functional adaptors (27) and placed sample index (index 2) at the head of read 1, following unique molecular index (SI Appendix, Methods). ARPE-19 cells expressing WT-SaCas9 or SaCas9-HF and VEGFA_8 sgRNA were transduced with AAV8 vectors. Data. Sequencing data are deposited under the European Nucleotide Archive (PRJEB31487).

Acknowledgments We thank financial supports from Ming Wai Lau Centre of Reparative Medicine of Karolinska Institutet (Lau grant), City University of Hong Kong (internal grant), the National Natural Science Foundation of China (grant 81672098 to Z.Z. and 81770099 to J.S.), The Swedish Research Council (2016-02830 to Z.Z.), the Innovation and Technology Fund of Hong Kong Government (9440153 to Z.Z.), the Hong Kong Health and Medical Research Fund (05160296 to J.S.), the Hong Kong Research Grants Council (21101218 to J.S.), Shenzhen Science and Technology Innovation Fund (JCYJ20170413115637100 and JCYJ20170412152916724 to J.S.), and Sanming Project of Medicine in Shenzhen (SZSM201811092 to J.S.).

Footnotes Author contributions: Y.T., D.A.H., W.X., M.J., J.S., and Z.Z. designed research; Y.T., A.H.Y.C., S.B., D.A.H., F.T.K., W.X., J.S., and Z.Z. performed research; Y.T., A.H.Y.C., S.B., D.A.H., F.T.K., W.X., M.J., J.S., and Z.Z. contributed new reagents/analytic tools; Y.T., A.H.Y.C., J.S., and Z.Z. analyzed data; and Y.T., A.H.Y.C., and Z.Z. wrote the paper.

The authors declare no competing interest.

This article is a PNAS Direct Submission.

Data deposition: Sequencing data reported in this article have been deposited in the European Nucleotide Archive (accession no. PRJEB31487).

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1906843116/-/DCSupplemental.