Identification of conserved target sites in the HBV genome

Genetic diversity is a hallmark feature of hepatitis B virus strains, which exist as eight distinct genotypes (A-H) distributed around the globe6,7,35. To potentially target a majority of HBV isolates by Cas9n, we first identified conserved sequences in the viral genome. A publicly available database of HBV isolate sequences was used to analyze sequence conservation36. First, using a total of 1,931 genotype A sequences, the HBV genome was analyzed to find a region with a length of at least 50 nucleotides, where according to the alignments dataset, the individual positions have high conservation values (more than 99.9%).

Two conserved regions were identified within the open reading frame (ORF) S and X (each overlapping with ORF P) of the HBV genome (indicated by scissor symbols in Fig. 1A). In each of these conserved sequences, a single pair of highly conserved single guide RNA (sgRNA) Cas9n target sequences plus proto-spacer adjacent motifs (PAM) was identified (referred to as S1 and S2, or X1 and X2) which fulfilled all the filtering criteria, i.e. displayed >95% conservation over the full sequence of 23 nucleotides (nt) among the 1,931 genotype A isolates (for details see Methods section and Fig. 1B).

Figure 1 HBV-specific gRNA target sites for Cas9n recruitment. (A) Schematic representation of the hepatitis B virus genome. Relaxed circular DNA (rcDNA) of the HB virion (thin continuous line), which is converted to cccDNA (thin continuous and dotted line) following hepatocyte infection, is indicated in the centre of the map. The four viral transcripts of the core (C), polymerase (P) and surface (S) and X proteins are indicated around the outside. Regions targeted by Cas9n via guide RNA (gRNA) specific to S and X sequences are indicated by arrows and scissors (scissors were drawn by Niklas Beschorner). (B) DNA sequence and sequence conservation of the regions targeted by Cas9n within the S and X gene of HBV. The sequence shown is based on genotype A consensus. Target sequences in ORF S and X are depicted (S1 and S2, or X1 and X2), each encompassing proto-spacer adjacent motifs (PAM, bold and boxed), 2 × 20 nucleotides complementary to gRNA (boxed) and offset distance between the two sequences complementary to gRNA (underlined). Full size image

The possibility of using these conserved regions for potentially inactivating other genotypes was subsequently evaluated by calculating the conservation of the respective 23 nt target sequence among isolates belonging to B, C, D, E, F and G genotypes. The sgRNA target sequences S1, S2, X1 and X2 exhibit more than 93% conservation over the full 23 nt sequence among the analyzed isolates of the B, C and E genotype and more than 89% conservation over the respective genotype D sequences, indicating high sequence conservation between HBV genotypes differing in their geographic distribution (Supplementary Table 1).

Functional validation of potential target sites located in the open reading frame S and X

For functional analyses, the selected 20 bp gRNA sequences were individually cloned into expression plasmids expressing human codon-optimized Cas9n (D10A), with a pol III (U6) promoter driving single guide RNA (sgRNA) expression, as a 20 nt gRNA and trans-activating CRISPR RNA (tracrRNA) fusion32,33.

After identifying potential gRNA target sequences, pRG-HBV double fluorescent reporter constructs were generated to detect Cas9n activity. The vector design was based on comparable constructs previously reported by Kim and coworkers37. The pRG-HBV reporter plasmid constitutively expresses RFP and also contains the respective HBV S or X gene target sequence (73 bp or 57 bp in length; Fig. 1B) as well as the gene encoding eGFP, which is positioned out of frame relative to the RFP sequence (depicted in Fig. 2A). As indicated, the S- and X-specific sequences contain two 20 bp regions with adjacent PAMs necessary for gRNA binding and subsequent Cas9n-mediated double nicking. Thus, recruitment of Cas9n activity to the respective HBV-specific target sequence by two properly spaced sgRNAs results in a double strand break (indicated by an arrow in Fig. 2A), which, upon NHEJ repair, leads to indel formation. Due to codon triplet usage, statistically 1 out of 3 repairs then results in an “in frame” fusion of the eGFP gene to the mRFP gene located upstream of the cleavage site. Therefore, the presence of the reporter is detected by a sole RFP signal, while nuclease activity is visualized by simultaneous GFP and RFP fluorescence.

Figure 2 Target site validation by transient transfection of HeLa cells. (A) The HBV reporter plasmid for detecting Cas9n nuclease activity on HBV S or X target sequences is depicted at the top. The reporter construct contains a constitutive CMV promoter and sequences encoding RFP (red fluorescent protein) and GFP (enhanced green fluorescent protein), the latter lacking a start codon and positioned out of frame. RFP and GFP sequences are separated by the HBV S or X target site. Each target sequence contains two 20 bp regions necessary for gRNA binding and PAM motifs (expanded region). Cas9n nuclease activity aided by the pair of sgRNAs leads to two individual single stranded breaks within the target sequence (indicated by the large arrow; Cas9n cleaves the target sequence 3 nt upstream of the PAM region) which, upon non-homologous end joining repair, can lead to subtle sequence deletions and thereby frame shifts of the downstream GFP-specific sequence. (B) Cas9n activity on HBV sequences in HeLa cells at 48 h post transfection. Cultures were transfected with HBV S- or X-specific reporter and two Cas9n/sgRNA expression plasmids; HBV reporter plasmids alone (control); or with vectors expressing RFP, GFP and BFP (transfection control). Arrows indicate GFP expressing cells due to Cas9n activity. Scale bar = 400 μm. (C) Percentage of the total GFP + population in cultures containing the indicated vectors. Transfection control measured the presence of the positive control plasmid (GFP +) by flow cytometry. Full size image

A set of three plasmids comprising the pRG-HBV-S or pRG-HBV-X reporter construct together with two appropriate Cas9n/sgRNA expression plasmids was used to monitor Cas9n activity on the respective HBV sequences in human cells. In addition, transfection with the HBV reporter plasmids alone (i.e. omitting Cas9n/sgRNA expression constructs) or cotransfection with individual plasmids for RFP, GFP or BFP expression served as controls. Cas9n activity was detected in HeLa cells at 48 hours post transfection (Fig. 2B, green panels). As expected, cell cultures containing just the HBV S or X reporter construct did not display any GFP signals but were positive for RFP expression. In contrast, GFP signals were detected in cells cotransfected with vectors expressing Cas9n and sgRNAs along with the matching HBV reporter (arrows in Fig. 2B), indicating that in the respective target sequence Cas9n/sgRNA-mediated cleavage followed by NHEJ repair had occurred, positioning the sequences encoding GFP and RFP in frame with each other.

Another transfection of HeLa cells was used to quantify Cas9n activity via FACS analysis. As depicted, transfected cultures containing the pRG-HBV-S or pRG-HBV-X reporter construct either failed or displayed only a very low amount of cells positive for GFP expression; i.e. 0% and 1.5 % of GFP + cells (Fig. 2C), which is consistent with the data seen by fluorescence microscopy (Fig. 2B). However, when vectors for Cas9n and sgRNAs were cotransfected along with the reporter plasmids, clearly elevated levels of fluorescence were detected at 48 hours post transfection, with 8% of GFP + cells in the culture where HBV S sequences were the target and 7.8% of GFP + cells in the culture where X sequences were recognized (Fig. 2C).

To analyze Cas9n activity on HBV sequences in another cell line, we next transfected and quantified HEK293 cells as above. For a control of specificity we now included transfections with mismatched reporter and Cas9n/sgRNA expression plasmids. Similar to the previous results, at 24 hours post transfection GFP was only successfully induced in the cell cultures cotransfected with HBV reporters and the vectors expressing Cas9n and their matching sgRNAs (Fig. 3A; arrows). The control experiments, constitutively expressing GFP, containing reporter vector only, or containing mismatched pairs of reporter and sgRNAs, revealed the expected phenotypes (Fig. 3A). Furthermore, FACS analyses demonstrated that the number of RFP + positive (i.e. transfected) cells was comparable in each HEK293 culture (Fig. 3B), while the number of RFP + /GFP + cells substantially increased in the cultures transfected with matching pairs of reporter and Cas9n/sgRNA expression plasmids (Fig. 3C,D).

Figure 3 Analysis of Cas9n activity in HEK293 cells. (A) HEK293 cells were transfected with the respective HBV reporter (pRG-HBV-S or pRG-HBV-X) and two plasmids for Cas9n/sgRNA expression (Cas9n-sgRNA-S1 and Cas9n-sgRNA-S2; Cas9n-sgRNA-X1 and Cas9n-sgRNA-X2). For transfection control, cells were transfected with a constitutively GFP-expressing plasmid or mock-transfected. Cells were imaged at 24 h post transfection. Arrows indicate GFP expressing cells due to Cas9n activity. Scale bar = 400 μm. (B) Cells were analyzed by FACS for RFP fluorescence to determine the presence of the HBV reporter in the transfected cells. (C,D) Target-specific Cas9n/gRNA nuclease activity was quantified by FACS analysis of GFP + cells within the populations of RFP + (reporter containing) cells. Full size image

To more directly detect Cas9n-mediated mutagenesis of HBV-derived target sequences, we performed an assay with T7 endonuclease I (T7EI), which detects Cas9-induced mutations by cutting the DNA at mismatched nucleotides37,38. We transfected HEK293 cells as before and obtained total (chromosomal and episomal) cellular DNA from sorted GFP + cells, which served as templates for PCR-mediated enrichment of the HBV S or X target sequences (Fig. 4A). Following the T7EI assay, smaller sized products of approximately 380 and 190 bp, indicating the presence of mismatched DNA in the target sequence, were only present in cell samples transfected with the correct combination of reporter and Cas9n/sgRNA expression plasmids (Fig. 4B,C).

Figure 4 Analysis of Cas9n activity by T7 endonuclease I assay. (A) T7EI assays were performed using PCR primers (indicated by arrows) flanking the HBV S or X sequence in the respective reporter plasmid. (B) Detection of Cas9n-specific activity was visualized by gel electrophoresis. HEK293 cells were transfected as before and total genomic DNA was isolated at 72 h post transfection for subsequent T7EI cleavage. Arrows depict the sizes of wild-type and Cas9n-mutagenized DNA fragments. (C) T7EI assay using genomic DNA from GFP + HEK293 cell cultures at 24 h post transfection. (D) Sequence analysis of corresponding DNA samples. Alignment to the wild-type ORF S reporter sequence is shown. gRNA sequences (boxed), PAM (boxed and bold) and Cas9n-mediated deletions are indicated. Full size image

To verify indel formation in the HBV reporter we performed a straightforward sequence analysis. HEK293 cells were cotransfected as before with pRG-HBV-S reporter and matching Cas9n/sgRNA expression plasmids. Total DNA was isolated from RFP + /GFP + cells, transformed into E.coli and individual HBV sgRNA target sites were analyzed in selected recovered reporter plasmids. As an example, of the four sequences shown in Fig. 4D, three represent Cas9n/sgRNA-induced deletions (individual clones #1, #2 and #4; Fig. 4D). Of note, Cas9n cleaves the target sequence 3 nt upstream of the PAM region (Fig. 4D; PAM indicated in bold letters). Since the HBV S-specific sgRNAs were designed to anneal to opposite strands with an offset of 27 nt, subsequent cleavage creates single-stranded 5′ overhangs, which are prone to deletion by NHEJ. Therefore, the observed deletions represent almost perfect 5′ overhang sequences (Fig. 4D).

Taken together, these data suggest that sequences in the HBV S and X gene can be recognized in human cells by sgRNAs and serve as substrates for subsequent Cas9n-mediated indel formation.

Cas9n activity on integrated HBV reporter constructs

The persistence of episomal nuclear cccDNA is considered a main obstacle of curative HBV therapies. However, random integration of HBV DNA into the host cell genome is common, which may also contribute to disease outcome8. Therefore, the accessibility of stably integrated HBV S and X target sequences for Cas9n was tested.

Stable HeLa and HEK293 cell lines containing integrated HBV-X or HBV-S reporter sequences (see Supplementary Figure 1) were generated using PiggyBac targeting vectors (System Biosciences Inc.). These cell lines were transfected with matching or mismatching (control) Cas9n/sgRNA expression vectors. Additional control experiments were performed by transfecting negative control sgRNA vectors (targeting an unrelated genomic locus), or a constitutively GFP-expressing plasmid (positive control). At 72 hours post transfection, cultures were analyzed by fluorescence microscopy and FACS as above. In both the HEK293 cell cultures (Fig. 5) and HeLa cell cultures (Fig. 6), the number of GFP + cells only increased in the cultures transfected with plasmids expressing sgRNAs matching the integrated HBV reporter sequences (Figs 5A–D and 6A–D), demonstrating that Cas9n/sgRNAs accurately cleave their targets in a genomic context.

Figure 5 Targeting chromosomally integrated HBV-sequences using the CRISPR/Cas9n system in HEK293 cell cultures. (A) Fluorescence microscopy images of the stable HBV S-specific reporter cell line cotransfected with plasmids expressing Cas9n and S or X (control) sequence-specific sgRNAs, or sgRNA targeted to an unrelated locus (negative control; Cas9n-sgRNA-neg). A constitutively GFP-expressing vector served as a transfection control. GFP expressing cells indicate Cas9n activity. Scale bar = 400 μm. (B) Analysis of a stable HBV X-specific reporter cell line as above. (C) HBV S sequence-specific Cas9n/sgRNA-mediated nuclease activity was quantified by GFP-specific FACS analysis. (D) As in C to quantify HBV X sequence-specific Cas9n/sgRNA-mediated nuclease activity. Full size image

Figure 6 Analysis of integrated HBV reporter constructs in HeLa cells. (A) Targeting of chromosomally integrated HBV ORF S sequences by the CRISPR/Cas9n system was analyzed at 72 h post transfection, as described in Fig. 5A. GFP expressing cells indicate Cas9n activity. Scale bar = 400 μm. (B) Analysis of chromosomally integrated HBV ORF X sequences as described in Fig. 5B. (C,D) Quantification of the experiments shown in panel A and B, respectively, by FACS analysis of GFP + cells. (E) The stable HBV S and X target site-specific HeLa cultures, transfected with the indicated combinations of Cas9n and sgRNA-expressing vectors, analyzed by the T7EI cleavage assay. Targeting of an arbitrary genomic locus by sgRNA specific to this region was used as a positive control (ctrl). Full size image

Stable HeLa cells, containing either HBV-S or HBV-X reporter sequences were next analyzed by the T7EI assay (Fig. 6E). An arbitrary genomic locus previously tested for efficient Cas9n targeting and matching specific sgRNA was used as a positive control (ctrl). The smaller bands, representing T7 endonuclease-cleaved DNA strands where a mismatch was introduced by the Cas9n/sgRNA system, were only present in cell samples transfected with a pair of Cas9n/sgRNA expression plasmids matching the integrated HBV reporter sequence.

These data demonstrated that the Cas9n/sgRNA system can successfully target HBV-derived sequences that are stably integrated into the host cell genome.

HBV inactivation in chronically and de novo infected hepatoma cell lines

At this point, the combined data obtained by using reporter constructs indicated that episomal or integrated sequences encoding the HBs or HBx antigen can be substrates for Cas9n-mediated inactivation. However, future application in curative HBV-therapies requires targeting the HBV genome in chronically infected hepatocytes. We therefore investigated the established hepatocyte cell lines HepG2.2.15 and HepG2-H1.3. Both of these cell lines carry HBV genomes, including chromosomally integrated sequences and cccDNA and importantly, release virus particles into the culture supernatant39,40.

To transduce the respective cell cultures with expression cassettes for Cas9n and the two target site-specific sgRNAs, we constructed a self-inactivating (SIN) lentiviral vector (LV). Gene sequences encoding eGFP and Cas9n were placed under the control of the human elongation factor-1 α (EF1α) promoter separated by the equine rhinitis A virus (ERAV) 2A-like sequence41, which allows the enrichment of transduced cells (i.e. GFP + cells) by FACS. In the opposite transcriptional direction, the pair of sgRNAs required to recruit Cas9n to a specific HBV target site are expressed independently by a U6 and a H1 pol III promoter (Fig. 7A).

Figure 7 Inactivation of HBV in chronically and de novo infected hepatocytes. (A) The backbone of the HIV-derived lentiviral vector (LV) for delivering Cas9n and a pair of sgRNAs contains self-inactivating (SIN) long terminal repeats (LTR:ΔU3, R, U5), a Rev response element (RRE), a central polypurine tract (cPPT), a woodchuck hepatitis virus post-regulatory element (PRE), SV40 polyadenylation enhancer elements (USE), splice donor (SD), splice acceptor (SA) and packaging signal (Ψ) sites. Expression of an eGFP-2A peptide-Cas9n fusion protein is regulated by the internal human elongation factor 1α (EF1α) promoter. Transcription of two sgRNAs, sgRNA1 or sgRNA2, is regulated by a U6 or H1 pol III promoter. (B,C) HBsAg in the filtered supernatants of HepG2.2.15 and HepG2-H1.3 cells was quantified by ELISA at day 5 and/or at day 10 post transduction with LV-HBS, LV-HBX or LV-GFP. Experiments were performed in duplicate. The lower limit of detection (Background + 3 x S.D.) was 0.9 ng/ml for HBsAg. All values are given as mean concentration ng/ml ± S.D. (D) HBsAG in the filtered supernatant of LV-HBS, LV-HBX or mock transduced HepG2hNTCP cells was quantified as described before at day 8 post HBV infection. Experiments were performed in quadruplicate. Full size image

HepG2.2.15 cells were transduced with VSV-G pseudotyped Cas9n/sgRNA-expressing LV particles, or a corresponding negative control vector expressing GFP alone (LV-GFP). Transduced hepatocytes were enriched by FACS and release of HBV particles was monitored over time by HBsAg ELISA. While accumulation of HBV in the culture supernatants increased with time in untreated and LV-GFP-transduced cultures, no HBV particle release was detected by HBsAg ELISA from HepG2.2.15 cells transduced with LV expressing Cas9n together with sgRNA specific for either ORF S (HBS) or ORF X (HBX) (Fig. 7B). Similarly, transduction of HepG2-H1.3 cells with LV-HBS or LV-HBX resulted in significant inhibition of HBV progeny formation at day 5 post transduction (Fig. 7C).

Finally, we investigated Cas9n-mediated HBV inactivation in de novo infected hepatocytes. HepG2hNTCP cells42,43 were transduced as before with Cas9n/sgRNA-expressing LV particles and infected with HBV, purified from cell culture supernatant of HepG2.2.15 cells39. Again, HBsAg release was clearly impaired at day 8 post infection in cells expressing Cas9n together with ORF S- or X-specific sgRNAs (Fig. 7D), indicating that the viral cccDNA was targeted.

The combined data from these three models of HBV infection suggest that vector-mediated delivery of the Cas9n-system, targeted to virus-specific genomic sites, inactivated persistent virus. To directly confirm this observation, we subjected total cellular DNA from the HepG2.2.15 and HepG2-H1.3 cultures to target-site-specific next generation sequencing. After generating amplicons containing the target regions of S- or X-specific sgRNAs from HepG2.2.15 or HepG2-H1.3 cells, we analyzed approximately 350,000 amplicons from each sample using an Illumina MiSeq instrument. Amplicons from mock-transduced control cultures were sequenced in parallel to establish the wild-type sequences of resident HBV genomes (see Methods section for details). The results confirmed that efficient editing of HBV genomes had occurred in Cas9n/sgRNA expressing cells. On average between 44% and 89% of all reads derived from sgRNA-S- or sgRNA-X-expressing cells, respectively, exhibited indel signatures (Table 1). Consistent with the expected pattern of Cas9n/sgRNA-induced mutations, the spatial distribution of nucleotides affected by the various indels showed a marked peak at sites complementary to S- and X-specific sgRNAs (Fig. 8A,B). Also as expected, no deletions or insertions were detected in amplicons from the mock-infected controls (dashed gray lines in Fig. 8B).

Table 1 Indel statistics. Full size table

Figure 8 Analysis of indels in sgRNA-treated HepG2-H1.3 and HepG2-H2.2.15 cells. (A) Scheme of the HBV genome depicting the location of amplicons spanning the S- and X-specific sgRNA target regions. The two amplicons are shown as dark gray boxes labelled S-ampl. and X-ampl. Arrows shown at the top indicate the genomic location of gRNA target sequences. (B) Indel frequency as detected in S-amplicons (left) or X-amplicons (right) from HepG2-H1.3 (top panels) or HepG2.2.15 (bottom panels) cells. The graphs represent the frequency with which each individual amplicon’s nucleotide is affected by indels in sgRNA-expressing (solid black line) or mock transduced cells (gray dashed line). Nucleotide positions given on the x-axis indicate coordinates on the full-length HBV genome (accession JN664938). The locations of gRNA target sequences are indicated by arrows underneath the bottom panels. (C) Example alignments of indel amplicon reads from Cas9n/sgRNA-X-expressing HepG2-H1.3 (top sequence in each alignment) aligned to wild-type HBV amplicon sequences (bottom sequence in each alignment). Indel sites are indicated above each read. The sgRNA target sequences and PAM motifs appear in bold or underlined, respectively. Full size image

Although indels were generally located at or near gRNA target sites, individual indel events varied with regard to the lengths of deletions, as well as lengths and nucleotide sequences of insertions. The average length of deleted sequences was between 37–51 nucleotides in sgRNA-S treated and 30–35 nucleotides in sgRNA-X treated cells, while the average length of insertions was approximately 2 to 3 nucleotides in all cases (Table 1). The identified deletions were similar to the nickase-mediated indel lengths observed using the reporter plasmid approach (Fig. 4D). Using the HBV reporter plasmid, Cas9n induced “offset” deletions by cutting 3 nt upstream of the PAM. The predicted “offset” deletions for S (61 nt) and X (45 nt) were observed to some degree in our next generation sequencing samples. However, substantial fractions of reads in each of the samples also showed deletions or insertions with lengths significantly longer or shorter than these average values (for a more detailed representation of length distributions see Supplementary Figures 2 and 3).

Several examples of indel read alignments from Cas9/sgRNA-X transduced HepG2-H1.3 cells are shown in Fig. 8C. The first and second alignments represent contiguous deletions of 46 and 32 nucleotides, respectively, with no additional insertions being present. The third alignment shows a 21 nt deletion together with an insertion of 13 nucleotides, with the latter likely being the result of non-templated addition of nucleotides during NHEJ repair events. Finally, the fourth alignment shows an indel where the number of inserted nucleotides (56 nt) exceeds the length of the primary deletion (48 nt), resulting in a net gain of 8 nucleotides.