The microbial adaptive immune system CRISPR mediates defense against foreign genetic elements through two classes of RNA-guided nuclease effectors. Class 1 effectors utilize multi-protein complexes, whereas class 2 effectors rely on single-component effector proteins such as the well-characterized Cas9. Here, we report characterization of Cpf1, a putative class 2 CRISPR effector. We demonstrate that Cpf1 mediates robust DNA interference with features distinct from Cas9. Cpf1 is a single RNA-guided endonuclease lacking tracrRNA, and it utilizes a T-rich protospacer-adjacent motif. Moreover, Cpf1 cleaves DNA via a staggered DNA double-stranded break. Out of 16 Cpf1-family proteins, we identified two candidate enzymes from Acidaminococcus and Lachnospiraceae, with efficient genome-editing activity in human cells. Identifying this mechanism of interference broadens our understanding of CRISPR-Cas systems and advances their genome editing applications.

To explore the suitability of Cpf1 for genome-editing applications, we characterized the RNA-guided DNA-targeting requirements for 16 Cpf1-family proteins from diverse bacteria, and we identified two Cpf1 enzymes from Acidaminococcus sp. BV3L6 and Lachnospiraceae bacterium ND2006 that are capable of mediating robust genome editing in human cells. Collectively, these results establish Cpf1 as a class 2 CRISPR-Cas system that includes an effective single RNA-guided endonuclease with distinct properties that has the potential to substantially advance our ability to manipulate eukaryotic genomes.

Here, we show that Cpf1-containing CRISPR-Cas loci of Francisella novicida U112 encode functional defense systems capable of mediating plasmid interference in bacterial cells guided by the CRISPR spacers. Unlike Cas9 systems, Cpf1-containing CRISPR systems have three features. First, Cpf1-associated CRISPR arrays are processed into mature crRNAs without the requirement of an additional trans-activating crRNA (tracrRNA) (). Second, Cpf1-crRNA complexes efficiently cleave target DNA proceeded by a short T-rich protospacer-adjacent motif (PAM), in contrast to the G-rich PAM following the target DNA for Cas9 systems. Third, Cpf1 introduces a staggered DNA double-stranded break with a 4 or 5-nt 5′ overhang.

Multiple class 1 CRISPR-Cas systems, which include the type I and type III systems, have been identified and functionally characterized in detail, revealing the complex architecture and dynamics of the effector complexes (). Several class 2 CRISPR-Cas systems have also been identified and experimentally characterized, but they are all type II and employ homologous RNA-guided endonucleases of the Cas9 family as effectors (). A second, putative class 2 CRISPR system, tentatively assigned to type V, has been recently identified in several bacterial genomes ( http://www.jcvi.org/cgi-bin/tigrfams/HmmReportPage.cgi?acc=TIGR04330 ) (). The putative type V CRISPR-Cas systems contain a large, ∼1,300 amino acid protein called Cpf1 (CRISPR from Prevotella and Francisella 1). It remains unknown, however, whether Cpf1-containing CRISPR loci indeed represent functional CRISPR systems. Given the broad applications of Cas9 as a genome-engineering tool (), we sought to explore the function of Cpf1-based putative CRISPR systems.

Almost all archaea and many bacteria achieve adaptive immunity through a diverse set of CRISPR-Cas (clustered regularly interspaced short palindromic repeats and CRISPR-associated proteins) systems, each of which consists of a combination of Cas effector proteins and CRISPR RNAs (crRNAs) (). The defense activity of the CRISPR-Cas systems includes three stages: (1) adaptation, when a complex of Cas proteins excises a segment of the target DNA (known as a protospacer) and inserts it into the CRISPR array (where this sequence becomes a spacer); (2) expression and processing of the precursor CRISPR (pre-cr) RNA resulting in the formation of mature crRNAs; and (3) interference, when the effector module—either another Cas protein complex or a single large protein—is guided by a crRNA to recognize and cleave target DNA (or in some cases, RNA) (). The adaptation stage is mediated by the complex of the Cas1 and Cas2 proteins, which are shared by all known CRISPR-Cas systems, and sometimes involves additional Cas proteins. Diversity is observed at the level of processing of the pre-crRNA to mature crRNA guides, proceeding via either a Cas6-related ribonuclease or a housekeeping RNaseIII that specifically cleaves double-stranded RNA hybrids of pre-crRNA and tracrRNA. Moreover, the effector modules differ substantially among the CRISPR-Cas systems (). In the latest classification, the diverse CRISPR-Cas systems are divided into two classes according to the configuration of their effector modules: class 1 CRISPR systems utilize several Cas proteins and the crRNA to form an effector complex, whereas class 2 CRISPR systems employ a large single-component Cas protein in conjunction with crRNAs to mediate interference ().

We further tested each Cpf1-family protein with additional genomic targets and found that AsCpf1 and LbCpf1 consistently mediated robust genome editing in HEK293FT cells, whereas the remaining Cpf1 proteins showed either no detectable activity or only sporadic activity ( Figures 7 E and S7 ) despite robust expression ( Figure S6 D). The only Cpf1 candidate that expressed poorly was PdCpf1 ( Figure S6 D). When compared to Cas9, AsCpf1 and LbCpf1 mediated comparable levels of indel formation ( Figure 7 E). Additionally, we used in vitro cleavage followed by Sanger sequencing of the cleaved DNA ends and found that 7, AsCpf1 and 13, LbCpf1 also generated staggered cleavage sites ( Figures S6 E and S6F, respectively).

(F) Indel distributions for AsCpf1 and LbCpf1 and DNMT1 target sites 2, 3, and 4. Cyan bars represent total indel coverage; blue bars represent distribution of 3′ ends of indels. For each target, PAM sequence is in red and target sequence is in light blue.

We tested each Cpf1-family protein for which we were able to identify a PAM for nuclease activity in mammalian cells. We codon optimized each of these genes and attached a C-terminal nuclear localization signal (NLS) for optimal expression and nuclear targeting in human cells ( Figure 7 A). To test the activity of each Cpf1-family protein, we selected a guide RNA target site within the DNMT1 gene ( Figure 7 B). We first found that each of the Cpf1-family proteins along with its respective crRNA designed to target DNMT1 was able to cleave a PCR amplicon of the DNMT1 genomic region in vitro ( Figure 7 C). However, when tested in human embryonic kidney 293FT (HEK293FT) cells, only two out of the eight Cpf1-family proteins (7, AsCpf1 and 13, LbCpf1) exhibited detectable levels of nuclease-induced indels ( Figures 7 C and 7D). This result is consistent with previous experiments with Cas9 in which only a small number of Cas9 orthologs were successfully harnessed for genome editing in mammalian cells ().

(C) Comparison of in vitro and in vivo cleavage activity. The DNMT1 target region was PCR amplified, and the genomic fragment was used to test Cpf1-mediated cleavage. All eight Cpf1-family proteins showed DNA cleavage in vitro (top), but only candidates 7, AsCpf1 and 13, Lb3Cpf1 facilitated robust indel formation in human cells.

(A) Eight Cpf1-family proteins were individually expressed in HEK293FT cells using CMV-driven expression vectors. The corresponding crRNA was expressed using a PCR fragment containing a U6 promoter fused to the crRNA sequence. Transfected cells were analyzed using either Surveyor nuclease assay or targeted deep sequencing.

Next, we applied the in vitro PAM identification assay ( Figure S6 A) to determine the PAM sequence for each Cpf1-family protein. We were able to identify the PAM sequence for seven new Cpf1-family proteins ( Figures 6 E, S6 B, and S6C), and the screen confirmed the PAM for FnCpf1 as 5′-TTN. The remaining eight tested Cpf1 proteins did not show efficient cleavage during in vitro reconstitution. The PAM sequences for the Cpf1-family proteins were predominantly T rich, only varying in the number of Ts constituting each PAM ( Figures 6 E, S6 B, and S6C).

(E and F) Sanger sequencing traces from 7 – AsCpf1-digested target (E) and 13 – LbCpf1-digested target (F) show staggered overhangs. The non-templated addition of an additional adenine, denoted as N, is an artifact of the polymerase used in sequencing (). Cleavage sites are indicated by red triangles. Smaller triangles indicate putative alternative cleavage side.

(A) Schematic for in vitro PAM screen using Cpf1-family proteins. A library of plasmids bearing randomized 5′ PAM sequences were cleaved by individual Cpf1-family proteins and their corresponding crRNAs. Uncleaved plasmid DNA was purified and sequenced to identify specific PAM motifs that were depleted.

Given the strong structural conservation of the direct repeats that are associated with many of the Cpf1-family proteins, we first tested whether the orthologous direct repeat sequences are able to support FnCpf1 nuclease activity in vitro. As expected, the direct repeats that contained conserved stem sequences were able to function interchangeably with FnCpf1. By contrast, the direct repeats from candidates 2 (Lb3Cpf1) and 6 (SsCpf1) were unable to support FnCpf1 cleavage activity ( Figure 6 D). The direct repeat from candidate 3 (BpCpf1) supported only a low level of FnCpf1 nuclease activity ( Figure 6 D), possibly due to the conservation of the 3′-most U.

The direct repeat sequences for each of these Cpf1-family proteins show strong conservation in the 19 nt at the 3′ of the direct repeat, the portion of the repeat that is included in the processed crRNA ( Figure 6 B). The 5′ sequence of the direct repeat is much more diverse. Of the 16 Cpf1-family proteins chosen for analysis, three (2, Lachnospiraceae bacterium MC2017, Lb3Cpf1; 3, Butyrivibrio proteoclasticus, BpCpf1; and 6, Smithella sp. SC_K08D17, SsCpf1) were associated with direct repeat sequences that are notably divergent from the FnCpf1 direct repeat ( Figure 6 B). However, even these direct repeat sequences preserved stem-loop structures that were identical or nearly identical to the FnCpf1 direct repeat ( Figure 6 C).

Based on our previous experience in harnessing Cas9 for genome editing in mammalian cells, only a small fraction of bacterial nucleases can function efficiently when heterologously expressed in mammalian cells (). Therefore, in order to assess the feasibility of harnessing Cpf1 as a genome-editing tool, we exploited the diversity of Cpf1-family proteins available in the public sequences databases. A BLAST search of the WGS database at the NCBI revealed 46 non-redundant Cpf1-family proteins ( Figure S5 A), from which we chose 16 candidates that, based on our phylogenetic reconstruction ( Figure S5 A), represented the entire Cpf1 diversity ( Figures 6 A and S5 ). These Cpf1-family proteins span a range of lengths between ∼1,200 and ∼1,500 amino acids.

Next, we studied the effect of direct repeat mutations on the RNA-guided DNA cleavage activity. The direct repeat portion of mature crRNA is 19 nt long ( Figure 2 A). Truncation of the direct repeat revealed that at least 16, but optimally more than 17 nt, of the direct repeat is required for cleavage. Mutations in the stem loop that preserved the RNA duplex did not affect the cleavage activity, whereas mutations that disrupted the stem loop duplex structure completely abolished cleavage ( Figure 5 D). Finally, base substitutions in the loop region did not affect nuclease activity, whereas the uracil base immediately proceeding the spacer sequence could not be substituted ( Figure 5 E). Collectively, these results suggest that FnCpf1 recognizes the crRNA through a combination of sequence-specific and structural features of the stem loop.

We first examined the length requirement for the guide sequence and found that FnCpf1 requires at least 16 nt of guide sequence to achieve detectable DNA cleavage and a minimum of 18 nt of guide sequence to achieve efficient DNA cleavage in vitro ( Figure 5 A). These requirements are similar to those demonstrated for SpCas9, in which a minimum of 16–17 nt of spacer sequence is required for DNA cleavage (). We also found that the seed region of the FnCpf1 guide RNA is approximately within the first 5 nt on the 5′ end of the spacer sequence ( Figures 5 B and S3 E).

Compared with the guide RNA for Cas9, which has elaborate RNA secondary structure features that interact with Cas9 (), the guide RNA for FnCpf1 is notably simpler and only consists of a single stem loop in the direct repeat sequence ( Figure 3 A). We explored the sequence and structural requirements of crRNA for mediating DNA cleavage with FnCpf1.

The RuvC-like domain of Cpf1 retains all of the catalytic residues of this family of endonucleases ( Figures 4 A and S4 ) and is thus predicted to be an active nuclease. Therefore, we generated three mutants—FnCpf1(D917A), FnCpf1(E1006A), and FnCpf1(D1225A) ( Figure 4 A)—to test whether the conserved catalytic residues are essential for the nuclease activity of FnCpf1. We found that the D917A and E1006A mutations completely inactivated the DNA cleavage activity of FnCpf1, and D1255A significantly reduced nucleolytic activity ( Figure 4 B). These results are in contrast to the mutagenesis results for Streptococcus pyogenes Cas9 (SpCas9), where mutation of the RuvC (D10A) and HNH (N863A) nuclease domains converts SpCas9 into a DNA nickase (i.e., inactivation of each of the two nuclease domains abolished the cleavage of one of the DNA strands) () ( Figure 4 B). These findings suggest that the RuvC-like domain of FnCpf1 cleaves both strands of the target DNA, perhaps in a dimeric configuration. Interestingly, size-exclusion gel filtration of FnCpf1 shows that the protein is eluted at a size of ∼300 kD, twice the molecular weight of a FnCpf1 monomer ( Figure S2 B).

Multiple sequence alignment of the amino acid sequences of FnCpf1, AsCpf1, and LbCpf1 shows many highly conserved residues. Residues that are conserved are highlighted with a red background and conserved mutations are highlighted with an outline and red font. Secondary structure prediction is highlighted above (FnCpf1) and below (LbCpf1) the alignment. Alpha helices are shown as a curly symbol and beta strands are shown as dashes. Putative catalytic residues are highlighted in yellow. Protein domains identified in Figure 1 A are also highlighted.

(B) Native TBE PAGE gel showing that mutation of the RuvC catalytic residues of FnCpf1 (D917A and E1006A) and mutation of the RuvC (D10A) catalytic residue of SpCas9 prevents double-stranded DNA cleavage. Denaturing TBE-Urea PAGE gel showing that mutation of the RuvC catalytic residues of FnCpf1 (D917A and E1006A) prevents DNA-nicking activity, whereas mutation of the RuvC (D10A) catalytic residue of SpCas9 results in nicking of the target site.

We also mapped the cleavage site of FnCpf1 using Sanger sequencing of the cleaved DNA ends. We found that FnCpf1-mediated cleavage results in a 5-nt 5′ overhang ( Figures 3 A, 3D, and S3 A–S3D ), which is different from the blunt cleavage product generated by Cas9 (). The staggered cleavage site of FnCpf1 is distant from the PAM: cleavage occurs after the 18base on the non-targeted (+) strand and after the 23base on the targeted (–) strand ( Figures 3 A, 3D, and S3 A–S3D). Using double-stranded oligo substrates with different PAM sequences, we also found that FnCpf1 requires the 5′-TTN PAM to be in a duplex form in order to cleave the target DNA ( Figure 3 E).

(A–D) Sanger sequencing traces from FnCpf1-digested DNA targets show staggered overhangs. The non-templated addition of an additional adenine, denoted as N, is an artifact of the polymerase used in sequencing (). Sanger traces are shown for different TTN PAMs with protospacer 1 (A), protospacer 2 (B), protospacer 3 (C), and targets DNMT1 and EMX1 (D). The (–) strand sequence is reverse-complemented to show the top strand sequence. Cleavage sides are indicated by red triangles. Smaller triangles indicate putative alternative cleavage side.

The finding that FnCpf1 can mediate DNA interference with crRNA alone is highly surprising given that Cas9 recognizes crRNA through the duplex structure between crRNA and tracrRNA (), as well as the 3′ secondary structure of the tracrRNA (). To ensure that crRNA is indeed sufficient for forming an active complex with FnCpf1 and mediating RNA-guided DNA cleavage, we investigated whether FnCpf1 supplied only with crRNA can cleave target DNA in vitro. We purified FnCpf1 ( Figure S2 ) and assayed its ability to cleave the same protospacer-1-containing plasmid used in the bacterial DNA interference experiments ( Figure 3 A). We found that FnCpf1 along with an in-vitro-transcribed mature crRNA-targeting protospacer 1 was able to efficiently cleave the target plasmid in a Mg- and crRNA-dependent manner ( Figure 3 B). Moreover, FnCpf1 was able to cleave both supercoiled and linear target DNA ( Figure 3 C). These results clearly demonstrate the sufficiency of FnCpf1 and crRNA for RNA-guided DNA cleavage.

(D) Sanger-sequencing traces from FnCpf1-digested target show staggered overhangs. The non-templated addition of an additional adenine, denoted as N, is an artifact of the polymerase used in sequencing (). Reverse primer read represented as reverse complement to aid visualization. See also Figure S3

(A) Coomassie blue stained acrylamide gel of FnCpf1 stepwise purification. A band just above 160 kD eluted from the Ni-NTA column, consistent with the size of a MBP-FnCpf1 fusion (189.7 kD). Upon addition of TEV protease a lower molecular weight band appeared, consistent with the size of 147 kD free FnCpf1.

To confirm that no additional RNAs are required for crRNA maturation and DNA interference, we constructed an expression plasmid using synthetic promoters to drive the expression of Francisella cpf1 (FnCpf1) and the CRISPR array (pFnCpf1_min). Small RNaseq of E. coli expressing this plasmid still showed robust processing of the CRISPR array into mature crRNA ( Figure 2 B), indicating that FnCpf1 and its CRISPR array are the only elements required from the FnCpf1 locus to achieve crRNA processing. Furthermore, E. coli expressing pFnCpf1_min as well as pFnCpf1_ΔCas, a plasmid with all of the cas genes removed but retaining native promoters driving the expression of FnCpf1 and the CRISPR array, also exhibited robust DNA interference, demonstrating that FnCpf1 and crRNA are sufficient for mediating DNA targeting ( Figure 2 C). By contrast, Cas9 requires both crRNA and tracrRNA to mediate targeted DNA interference ().

After showing that cpf1-based CRISPR loci are able to mediate robust DNA interference, we performed small RNA sequencing to determine the exact identity of the crRNA produced by these loci. By sequencing small RNAs extracted from a Francisella novicida U112 culture, we found that the CRISPR array is processed into short mature crRNAs of 42–44 nt in length. Each mature crRNA begins with 19 nt of the direct repeat followed by 23–25 nt of the spacer sequence ( Figure 2 A). This crRNA arrangement contrasts with that of type II CRISPR-Cas systems in which the mature crRNA starts with 20–24 nt of spacer sequence followed by ∼22 nt of direct repeat (). Unexpectedly, apart from the crRNAs, we did not observe any robustly expressed small transcripts near the Francisella cpf1 locus that might correspond to tracrRNAs, which are associated with Cas9-based systems.

To further characterize the PAM requirements, we analyzed plasmid interference activity by transforming cpf1-locus-expressing cells with plasmids carrying protospacer 1 flanked by 5′-TTN PAMs. We found that all 5′-TTN PAMs were efficiently targeted ( Figure 1 E). In addition, 5′-CTA, but not 5′-TCA, was also efficiently targeted ( Figure 1 E), suggesting that the middle T is more critical for PAM recognition than the first T and that, in agreement with the sequence motifs depleted in the PAM discovery assay ( Figure S1 D), the PAM might be more relaxed than 5′-TTN.

To simplify experimentation, we cloned the Francisella novicida U112 Cpf1 (FnCpf1) locus ( Figure 1 A) into low-copy plasmids (pFnCpf1) to allow heterologous reconstitution in Escherichia coli. Typically, in currently characterized CRISPR-Cas systems, there are two requirements for DNA interference: (1) the target sequence has to match one of the spacers present in the respective CRISPR array, and (2) the target sequence complementary to the spacer (hereinafter protospacer) has to be flanked by the appropriate protospacer adjacent motif (PAM). Given the completely uncharacterized functionality of the FnCpf1 CRISPR locus, we adapted a previously described plasmid depletion assay () to ascertain the activity of Cpf1 and identify the requirement for a PAM sequence and its respective location relative to the protospacer (5′ or 3′) ( Figure 1 B). We constructed two libraries of plasmids carrying a protospacer matching the first spacer in the FnCpf1 CRISPR array with the 5′ or 3′ 7 bp sequences randomized. Each plasmid library was transformed into E. coli that heterologously expressed the FnCpf1 locus or into a control E. coli strain carrying the empty vector. Using this assay, we determined the PAM sequence and location by identifying nucleotide motifs that are preferentially depleted in cells heterologously expressing the FnCpf1 locus. We found that the PAM for FnCpf1 is located upstream of the 5′ end of the displaced strand of the protospacer and has the sequence 5′-TTN ( Figures 1 C, 1D and S1 ). The 5′ location of the PAM is also observed in type I CRISPR systems, but not in type II systems, where Cas9 employs PAM sequences that are located on the 3′ end of the protospacer (). Beyond the identification of the PAM, the results of the depletion assay clearly indicate that heterologously expressed Cpf1 loci are capable of efficient interference with plasmid DNA.

(C) Input library of plasmids carrying randomized 5′ PAM sequences. Plot shows depletion levels in ranked order. Depletion is measured as the negative log 2 -fold ratio of normalized abundance compared pACYC184 E. coli controls and PAMs above a threshold of 3.5 are used to generate sequence logos.

(B) Transformation of E. coli harboring pFnCpf1 with a library of plasmids carrying randomized 3′ PAM sequences. A subset of plasmids were depleted. Plot shows depletion levels in ranked order. Depletion is measured as the negative log 2 -fold ratio of normalized abundance compared pACYC184 E. coli controls and PAMs above a threshold of 3.5 are used to generate sequence logos.

(A) Transformation of E. coli harboring pFnCpf1 with a library of plasmids carrying randomized 5′ PAM sequences. A subset of plasmids were depleted. Plot shows depletion levels in ranked order. Depletion is measured as the negative log 2 -fold ratio of normalized abundance compared pACYC184 E. coli controls and PAMs above a threshold of 3.5 are used to generate sequence logos.

Cpf1 was first annotated as a CRISPR-associated gene in TIGRFAM ( http://www.jcvi.org/cgi-bin/tigrfams/HmmReportPage.cgi?acc=TIGR04330 ) and has been hypothesized to be the effector of a CRISPR locus that is distinct from the Cas9-containing type II CRISPR-Cas loci that are also present in the genomes of some of the same bacteria, such as multiple strains of Francisella and Prevotella () ( Figure 1 A). The Cpf1 protein contains a predicted RuvC-like endonuclease domain that is distantly related to the respective nuclease domain of Cas9. However, Cpf1 differs from Cas9 in that it lacks a second, HNH endonuclease domain, which is inserted within the RuvC-like domain of Cas9. Furthermore, the N-terminal portion of Cpf1 is predicted to adopt a mixed α/β structure and appears to be unrelated to the N-terminal, α-helical recognition lobe of Cas9 ( Figure 1 A). It has been shown that the nuclease moieties of Cas9 and Cpf1 are homologous to distinct groups of transposon-encoded TnpB proteins, the first one containing both RuvC and HNH nuclease domains and the second one containing the RuvC-like domain only (). Apart from these distinctions between the effector proteins, the Cpf1-carrying loci encode Cas1, Cas2, and Cas4 proteins that are more closely related to orthologs from types I and III than to those from type II CRISPR systems (). Taken together, these differences from type II have prompted the classification of Cpf1-encoding CRISPR-Cas loci as the putative type V within class 2 (). The features of the putative type V loci, especially the domain architecture of Cpf1, suggest not only that type II and type V systems independently evolved through the association of different adaptation modules (cas1, cas2, and cas4 genes) with different TnpB genes, but also that type V systems are functionally unique. The notion that Cpf1-carrying loci are bona fide CRISPR systems is further buttressed by the search of microbial genome sequences for similarity to the type V spacers that produced several significant hits to prophage genes—in particular, those from Francisella (). Given these observations and the prevalence of Cpf1-family proteins in diverse bacterial species, we sought to test the hypothesis that Cpf1-encoding CRISPR-Cas loci are biologically active and can mediate targeted DNA interference, one of the primary functions of CRISPR systems.

(C and D) Sequence logo for the FnCpf1 PAM as determined by the plasmid depletion assay. Letter height at each position is measured by information content (C) or frequency (D); error bars show 95% Bayesian confidence interval.

(B) Schematic illustrating the plasmid depletion assay for discovering the PAM position and identity. Competent E. coli harboring either the heterologous FnCpf1 locus plasmid (pFnCpf1) or the empty vector control were transformed with a library of plasmids containing the matching protospacer flanked by randomized 5′ or 3′ PAM sequences and selected with antibiotic to deplete plasmids carrying successfully targeted PAM. Plasmids from surviving colonies were extracted and sequenced to determine depleted PAM sequences.

Discussion

In this work, we characterize Cpf1-containing class 2 CRISPR systems, classified as type V, and show that its effector protein, Cpf1, is a single RNA-guided endonuclease. Cpf1 substantially differs from Cas9—to date, the only other experimentally characterized class 2 effector—in terms of structure and function and might provide important advantages for genome-editing applications. Specifically, Cpf1 contains a single identified nuclease domain, in contrast to the two nuclease domains present in Cas9. The results presented here show that, in FnCpf1, inactivation of RuvC-like domain abolishes cleavage of both DNA strands. Conceivably, FnCpf1 forms a homodimer ( Figure S2 B), with the RuvC-like domains of each of the two subunits cleaving one DNA strand. However, we cannot rule out that FnCpf1 contains a second yet-to-be-identified nuclease domain. Structural characterization of Cpf1-RNA-DNA complexes will allow testing of these hypotheses and elucidation of the cleavage mechanism.

Deltcheva et al., 2011 Deltcheva E.

Chylinski K.

Sharma C.M.

Gonzales K.

Chao Y.

Pirzada Z.A.

Eckert M.R.

Vogel J.

Charpentier E. CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III. Deltcheva et al., 2011 Deltcheva E.

Chylinski K.

Sharma C.M.

Gonzales K.

Chao Y.

Pirzada Z.A.

Eckert M.R.

Vogel J.

Charpentier E. CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III. Perhaps the most notable feature of Cpf1 is that it is a single crRNA-guided endonuclease. Unlike Cas9, which requires tracrRNA to process crRNA arrays and both crRNA and tracrRNA to mediate interference (), Cpf1 processes crRNA arrays independent of tracrRNA, and Cpf1-crRNA complexes alone cleave target DNA molecules, without the requirement for any additional RNA species. This feature could simplify the design and delivery of genome-editing tools. For example, the shorter (∼42 nt) crRNA employed by Cpf1 has practical advantages over the long (∼100 nt) guide RNA in Cas9-based systems because shorter RNA oligos are significantly easier and cheaper to synthesize. In addition, these findings raise more fundamental questions regarding the guide processing mechanism of the type V CRISPR-Cas systems. In the case of type II, processing of the pre-crRNA is catalyzed by the bacterial RNase III, which recognizes the long duplex formed by the tracrRNA and the complementary portion of the direct repeat (). Such long duplexes are not present in the pre-crRNA of type V systems, making it unlikely that RNase III is responsible for processing. Further experiments aimed at elucidating the processing mechanism of type V systems will shed light on the functional diversity of different CRISPR-Cas systems.

Garneau et al., 2010 Garneau J.E.

Dupuis M.E.

Villion M.

Romero D.A.

Barrangou R.

Boyaval P.

Fremaux C.

Horvath P.

Magadán A.H.

Moineau S. The CRISPR/Cas bacterial immune system cleaves bacteriophage and plasmid DNA. Jinek et al., 2012 Jinek M.

Chylinski K.

Fonfara I.

Hauer M.

Doudna J.A.

Charpentier E. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Gasiunas et al., 2012 Gasiunas G.

Barrangou R.

Horvath P.

Siksnys V. Cas9-crRNA ribonucleoprotein complex mediates specific DNA cleavage for adaptive immunity in bacteria. Maresca et al., 2013 Maresca M.

Lin V.G.

Guo N.

Yang Y. Obligate ligation-gated recombination (ObLiGaRe): custom-designed nuclease-mediated targeted integration through nonhomologous end joining. Chan et al., 2011 Chan F.

Hauswirth W.W.

Wensel T.G.

Wilson J.H. Efficient mutagenesis of the rhodopsin gene in rod photoreceptor neurons in mice. Cpf1 generates a staggered cut with a 5′ overhang, in contrast to the blunt ends generated by Cas9 (). This structure of the cleavage product could be particularly advantageous for facilitating non-homologous end joining (NHEJ)-based gene insertion into the mammalian genome (). Being able to program the exact sequence of a sticky end would allow researchers to design the DNA insert so that it integrates into the genome in the proper orientation. Specifically, in non-dividing cells, in which genome editing via homology-directed repair (HDR) mechanisms is especially challenging (), Cpf1 could provide an effective way to precisely introduce DNA into the genome via non-HDR mechanisms.

Another potentially useful feature of Cpf1 that might aid the introduction of new DNA sequences is that Cpf1 cleaves target DNA at the distal end of the protospacer, far away from the seed region. Therefore, Cpf1-induced indels will be located far from the target site, which is thus preserved for subsequent rounds of Cpf1 cleavage. With Cas9, any indel resulting from the dominant NHEJ repair pathway will disrupt the target site, effectively eliminating the possibility of inserting new DNA at that site in that particular cell. In the case of Cpf1, it appears possible that, if the first round of targeting results in an indel, a subsequent round of targeting could yet be repaired via HDR. Future exploration of these and other strategies using Cpf1 and other class 2 effectors is expected to bring solutions for some of the biggest challenges facing genome editing.

Gardner et al., 2002 Gardner M.J.

Shallom S.J.

Carlton J.M.

Salzberg S.L.

Nene V.

Shoaibi A.

Ciecko A.

Lynn J.

Rizzo M.

Weaver B.

et al. Sequence of Plasmodium falciparum chromosomes 2, 10, 11 and 14. Hsu et al., 2014 Hsu P.D.

Lander E.S.

Zhang F. Development and applications of CRISPR-Cas9 for genome engineering. Jiang et al., 2015 Jiang F.

Zhou K.

Ma L.

Gressel S.

Doudna J.A. STRUCTURAL BIOLOGY. A Cas9-guide RNA complex preorganized for target DNA recognition. The T-rich PAMs of the Cpf1-family also allow for applications in genome editing in organisms with particularly AT-rich genomes, such as Plasmodium falciparum () or areas of interest with AT enrichment, such as scaffold/matrix attachment regions. To date, all characterized mammalian genome-editing proteins require the presence of at least one G (), so the T- and T/C-dependent PAMs of Cpf1-family proteins expand the targeting range of RNA-guided genome editing nucleases.

The natural diversity of CRISPR systems provides a wealth of opportunities for understanding the origin and evolution of prokaryotic adaptive immunity, as well as for harnessing potentially transformative biotechnological tools. There is little doubt that, beyond the already classified and characterized diversity of the CRISPR-Cas types, there are additional systems with distinctive characteristics that await exploration and could further enhance genome editing and other areas of biotechnology as well as shed further light on the evolution of these defense systems.