AAV vector sequences are detected at CRISPR-induced DSBs

In analyzing next-generation sequencing (NGS) data from an in vivo CRISPR gene therapy approach targeting the Tmc1Beethoven mutation in the inner ear, we observed AAV inverted terminal repeat (ITR) sequences within CRISPR indels22. To confirm this observation and expand these findings to other genes, we first analyzed AAV vector integration in cultured cortical neurons derived from wild type (WT) C57BL/6 mice. Cells were treated with AAV1 carrying S. pyogenes Cas9 (SpCas9) and separate AAV1 vectors carrying gRNAs against the wild type (WT) coding sequence of Mecp2, Dnmt3b or Tmc1, at 105 or 106 vector genomes (vg) per cell. Cells were incubated with AAV vectors for 1 day and then kept in culture for another 20 days before harvesting genomic DNA to assess AAV integration (Fig. 1a). Genomic DNA was amplified by a high-fidelity polymerase using a very long extension time (see Methods) to allow for inclusion of potentially large integration events. Deep sequencing of PCR products from the region flanking the cut site revealed characteristic indel formation in all targeted genes (Fig. 1a and Supplementary Data 1), with the majority of indels observed being single nucleotide changes. Analysis of the percentage of reads that aligned to AAV sequences revealed that for all target genes, fusion reads between AAV vector genome and host genome were detectable at variable efficiencies (0.06–12.5%; Fig. 1a). To quantify AAV integration efficiency as a fraction of the total nuclease-induced events, reads that contained AAV sequences were normalized against all reads that harbored insertions or deletions compared with the reference sequence (including AAV integration). This measure is termed AAV capture ratio. We found the AAV capture ratio varied between 13.8% and 36.5% among different targets (Fig. 1a) and was not significantly different between the two different vg/cell conditions (paired t-test, p = 0.224). As expected, the higher dose of AAV-Cas9 and AAV-gRNA led to a higher percentage of reads with indels. For example at 105 gc/cell with the Tmc1 specific gRNA, there were 18.0% of reads with indels, while this increased to 49.8% at 106 gc/cell. We also reanalyzed genomic DNA from our previous study8 using mouse primary cortical neurons, which overexpress a mutated form of human APP gene (amyloid precursor protein with the Swedish mutation, APPSW from the Tg2576 strain). In this study8, the experimental conditions were the same as in the present study. Similarly, we observed AAV integration into APPSW, with capture ratios of 20.6% and 27.1% for 105 or 106 vg/cell, respectively. These results suggest that a substantial number of gene editing outcomes in non-dividing cultured neurons are a result of AAV vector integration at the on-target site.

Fig. 1 AAV vectors integrate into CRISPR/Cas9 cut sites in vitro and in vivo. a Primary murine cortical neurons were transduced with an AAV1 vector encoding Cas9 as well as another AAV1 vector encoding a guide RNA (gRNA) against the genes indicated. Two different doses, 1e5 gc/cell (left panel) and 1e6 gc/cell (right panel), were tested. The negative control was neurons transduced by AAV-Cas9 vector without gRNA (Tmc1 gene was amplified by PCR for 1e5 gc/cell and APP gene was amplified for 1e6 gc/cell). Frequency of AAV sequences present at indels at the target site are shown in red vs total number of indels in blue. AAV capture efficiencies are shown as percentages on the graphs. Two biological replicates were sequenced for each condition (3 for APPSW gene, 1e6 gc/cell dose). b AAV integration into Cas9 cut sites targeting therapeutic genes in the murine hippocampus, cochlea or muscle. For APPSW, non-injected cerebellum or cortex was used as control. For Tmc1, non-injected cochleas were used. Animal numbers and the number of sequencing reactions are as follows (numbers of animals pooled per reaction included in parentheses): Hippocampus, control: n = 3 (3 reactions), Mecp2: n = 5 (2 reactions, n = 3 and 2), Dnmt3b: n = 5 (2 reactions, n = 3 and 2), APPSW: n = 7 (7 reactions). Cochlea samples, non-injected: n = 21 (2 reactions, n = 9 and 12), injected: n = 33 (4 reactions, n = 6, 6, 9, and 12 animals). Muscle samples: n = 8 (2 reactions, n = 4 and 4). c CRISPResso analysis showing small indels at cut site from hippocampus, injected with AAV-Cas9 and AAV-gRNA against Dnmt3b. d Bimodal distribution of indel sizes, the larger indicating AAV sequence integration at the cut site, with specific examples shown in e (two sequencing reactions from 2 and 3 animals, respectively). f Characterization of AAV vector region present in indels with AAV-Mecp2-Cas9 (left panel) and AAV-U6-gRNA-syn-GFP (right panel) in brain samples (Dnmt3b was targeted). g Distribution of AAV integration surrounding the CRISPR cut site in the case of hippocampus, Dnmt3b was targeted (two sequencing reactions from 2 and 3 animals, respectively). Bars represent mean ± SD. Source data are provided as a Source Data file Full size image

Vector integration into therapeutically relevant genes in vivo

Next, we analyzed AAV vector integration into CRISPR-induced breaks in vivo in three different organs (brain, cochlea, and muscle). We first analyzed in vivo AAV integration in the brain by performing intrahippocampal injections of separate AAV1 vectors expressing Cas9 and gRNAs targeting either the Mecp2 or Dnmt3b genes (5 × 109 vg from AAV-Cas9 and 3 × 109 vg from AAV-gRNA). Similar to above, we also reanalyzed genomic DNA isolated from hippocampus tissue from human APPSW transgenic mice (Tg2576) treated with AAVs encoding Cas9 and a gRNA targeting the APPSW gene8. We observed indel formation and AAV integration at the CRISPR on-target site for all three target genes in vivo in the hippocampus (Fig. 1b and Supplementary Data 2). AAV capture ratios (reads with AAV integration normalized to reads with indels) were found to be 39.3%, 10.8%, and 32.3% for Mecp2, Dnmt3b, and APPSW, respectively.

In the cochlea, we analyzed data from our previous study22 and found AAV integration into Tmc1 (Fig. 1b). In this study, an allele-selective S. aureus Cas9 (SaCas9-KKH23) was used to target the Beethoven16 mutation, and AAV-mediated allele-selective disruption led to a complete halt of hair cell degeneration and hearing preservation up to 1 year post injection22. Integration was found in the Tmc1 gene in injected animals with a capture ratio of 26.6% (Fig. 1b).

We also analyzed genomic DNA from the study by Bengtsson et al.6 to assess AAV integration in muscle tissue in vivo. In this study, mice received intravenous injections of AAVs carrying either SpCas9 or S. aureus Cas9 (SaCas9) and gRNAs targeting the Dmd gene. We observed AAV integration in all CRISPR target sites, including SaCas9 and SpCas9 target sites in introns 51 and 53 and in exon 53 (Fig. 1b). Deleting exons 52 and 53 is one of the applied therapeutic strategies by Bengtsson et al.6. We wondered whether we could detect AAV sequences when the large ~45 kb genomic DNA region containing exons 52 and 53 is deleted; thus we performed a PCR with primers in introns 51 and 53. Sequencing this PCR product revealed the anticipated 45 kb deletion, but also high levels of AAV integration between the cut sites (Fig. 1b, and Supplementary Data 3) with up to 47.5% efficiency.

Altogether these results confirm that AAV integration is a common occurrence after Cas9-induced breaks in vivo in brain, cochlea and muscle.

Genome editing events after AAV-mediated CRISPR delivery

Next, we set out to more thoroughly characterize the molecular outcomes of genome editing and AAV vector integration in the treated hippocampus samples. Analysis of the size of indels occurring at the Dnmt3b on-target site revealed a bimodal distribution of relatively small deletions and longer insertions (Fig. 1c–e). While the majority of indels were relatively small (< 25 nt) in size (92.3 ± 1.4%), we also observed insertions longer than 25 nt (7.3 ± 1.6% of all indels; Fig. 1d, e). By aligning the long insertion reads to the AAV genome, we identified that these larger insertion events were AAV vector integrations (Fig. 1e). The vast majority (97.8%) of the junction sites between Dnmt3b and AAV were within 1 bp of the CRISPR cut site (3 bp upstream of the protospacer adjacent motif (PAM) site), suggesting that this AAV integration was indeed at the expected CRISPR-induced break site (Fig. 1e). Additionally, the majority (83.5%) of AAV-genomic junctions contained elements of the viral ITRs (Fig. 1e, f). However, it was not possible to determine the length and efficiency of AAV vector genome integration events due to the relatively short NGS read length (i.e., longer integrants containing ITR and other components of the AAV cassette may not be detected or detected at lower frequencies (Fig. 1f). By examining the genomic location of small indels and AAV-genomic junction sites, we confirmed the two types of events overlap suggesting that AAV integrates precisely at the cut site, but not around it (Fig. 1g). Furthermore, these results show that the majority of AAV-genomic junctions contain sequences of the ITR elements in vivo.

Genome-wide AAV mapping from mouse brain

Next, we wondered whether the expression of Cas9 and/or gRNA would facilitate AAV integration into genomic sites outside of the target locus. To map AAV integrations within the genome in the brain, we injected mice into the hippocampus with AAV1 vectors encoding for Cas9 and/or gRNA and performed deep sequencing of AAV-genomic junctions. To locate AAV integration sites, we used a modified version of the GUIDE-Seq24 pipeline adapted for AAV ITRs with primers specific for the ‘a’ region (Supplementary Fig. 1). We injected mice with AAV1 vectors encoding Cas9 and gRNAs targeting Mecp2, Dnmt3b, and APPSW sequences and 6 weeks after injection, we collected the hippocampus and isolated genomic DNA.

Integration sites were mapped with Virus-Clip25 and output was filtered to exclude false positive hits. We excluded sites that showed homology to the vector sequence or sites that only showed genomic sequences, but no viral ITR elements (for full details see Methods section). For a full list of all the sites, including excluded sites and reasons for exclusion, see Supplementary Data files 4–9. For all true integration sites showing genomic alignments see Supplementary Data 10–15 and Supplementary Data 16.

First, we analyzed the global integration profile of AAV in all samples and identified 11–19 unique integration sites per condition. We did not observe an increase in the overall number of integration sites when Cas9 and gRNA were both present (Fig. 2a). The majority of the integration sites were found to be intronic or intergenic, with an average of 44.7% and 33.4%, respectively. Exonic and regulatory region (promoter, downstream) integrations were rather rare (3.5% and 9.2%, respectively). In mice co-injected with AAV-Cas9 and AAV-gRNA vectors, we observed AAV integration in all three CRISPR target sites. Next, we analyzed the total integrant read counts normalized to the total number of sequenced reads (Fig. 2b). Total integration efficiency was the highest in animals treated with AAV-Cas9 and AAV-gRNA against APPSW (Fig. 2b, 82.0% and 92.4% of integrants were found in the on-target region). This is not surprising as APPSW is a transgenic mouse with multiple copies of the human transgene in a mouse genome. In the case of Mecp2 and Dnmt3b targets, 13.4 and 10.7% of integrants were in the on-target locus. Importantly, AAV integration also occurred in AAV-Cas9 only and AAV-gRNA only conditions with a similar level to the AAV-Cas9 + AAV-gRNADnmt3b condition.

Fig. 2 Genome-wide AAV mapping from CRISPR treated mouse brains. a Total number of unique integration sites. In the case of AAV-Cas9, AAV-gRNAMecp2, AAV-Cas9 + gRNAMecp2, and AAV-Cas9 + gRNADnmt3b, three mouse brains were pooled together for library construction. For AAV-Cas9 + gRNAAPPSW, hippocampus tissues from two animals were separately processed for library construction. The colors represent different genomic integration types and are based on the output of Virus-Clip. b Total number of reads that contain integrants normalized to total reads, based on the output from Virus-Clip. c Circos plots on showing the chromosomal location of AAV integration events. The more eccentric a dot is, the higher the normalized read count for that site is, on a logarithmic scale. The gene names inside the circle represent either CRISPR targets or sites that are common integration events (present at least in three different samples). Colors of gene names are the same as in b. The human APP gene was added as a separate chromosome. d Bubble-plots showing all integration sites. The size of the circle is proportional to the normalized read count. The color was kept consistent in the figure in respect to type of integration. Intergenic integrations are marked by the chromosome and location. Source data are provided as a Source Data file Full size image

Finally, we plotted all integration sites from all conditions and analyzed whether we could identify common sites. Integration sites appeared very variable and not consistent even between the two animals treated with AAV-Cas9 and AAV-gRNAAPPSW (Fig. 2c, d). However, we identified sites that favored AAV integration (Fig. 2c). These included Mgat5b (intronic, detected in 4/6 conditions), Pgm5 (intronic, detected in 4/6 conditions), a region of chr13 (97 MB) (intergenic, detected in 4/6 conditions), Aagab (intronic, detected in 3/6 conditions), a region of chr7, 72 MB (intergenic, detected in 3/6 conditions), Nrip2 (exonic, detected in 3/6 conditions), and Tex14 (intronic, detected in 3/6 conditions) (Fig. 2c, d). None of the identified sites showed homology to the gRNA target region, suggesting that AAV integration into predicted off-target cut sites was below the level of detection.

Taken together, these results suggest that the presence of Cas9 alone or the presence of Cas9 and gRNA co-delivered by AAV does not influence the genome-wide integration efficiency of AAV (compared with the AAV-gRNA alone vector), except at the CRISPR target site.

A miniaturized AAV allows characterization of CRISPR DSBs

The profile of AAV integration in CRISPR cut sites determined by NGS appears to favor the ITR region. However, due to the limitations of the read length of NGS (the size of the Cas9 encoding vector is over 4 kb and the gRNA vector over 2 kb), the profile generated in Fig. 1f is likely to be biased toward the periphery of the AAV vector genome, where ITRs are located. Thus, the question of whether AAV integration is occurring preferentially in the ITR region, or whether the AAV vector integrant is full length or fragmented, could not have been determined. In order to overcome this issue, we designed a minimal AAV construct, in which a very short cargo (175 bp) is flanked by ITR elements (Fig. 3a). For stuffer DNA, we chose a region of the λ-bacteriophage genome which is highly orthologous to the human/mouse genome. Together with the ITR elements, we synthesized and cloned this short vector, termed AAV-λ465 (465 bp).

Fig. 3 Characterization of AAV vector integration into CRISPR cut sites using a miniaturized AAV genome. a Schematic of standard-sized AAV-CBA-FLuc vector (top, 4062 bases) vs miniaturized AAV-λ465 (bottom, 465 bases). Chart is to scale. b Transmission electron microscopic examination of iodixanol gradient-purified capsids of AAV2-λ or AAV2-CBA-FLuc. c quantitation of full vs empty capsids (bars represent mean ± SEM, data from two independent experiments, 15 and 10 images were taken and 954 and 1231 capsids were counted for AAV-λ465 and AAV-CBA-FLuc, respectively, and p = 0.0254, unpaired t-test). d Alkaline gel electrophoreses and Southern blot for AAV genomes from iodixanol purified vectors (AAV2-CMV::NLS-SaCas9-NLS-3xHA-bGHpA;U6::BsaI-sgRNA (pX601, 4.8 kb size) and AAV-λ465 (465 bp size) and cellular genomic DNA containing integrated AAV-λ465. For Southern blot, we used a probe specific for the ITR region. Star (*) highlights the 465 bp expected band and pound (#) sign highlights concatemers in the AAV-λ465 genome. e ITR-genomic fusion events quantified by integration-specific qPCR assay, using AAV-λ465 or AAV2-CBA-FLuc vectors determined (bars represent mean ± SD). Three independent experiments were performed using two technical replicates each. f Heatmap of AAV specific ITR nucleotide integration at CRISPR cut site. More saturated red indicates higher frequency of breaks at the given position. g Integration profile of the entire miniaturized AAV genome from U2-OS cells. h Representative individual AAV integration clones showing different forms of integration detected (for all the clones, see Supplementary Fig. 4). Source data are provided as a Source Data file Full size image

First, we asked whether such a short AAV transgene cassette could be packaged into AAV2 capsids during production in 293T cells. Cells were transfected with plasmids required for AAV2-λ465 generation. As a control, we used a 4 kb long AAV vector encoding firefly luciferase (FLuc) driven by the chicken beta actin promoter (CBA), AAV-CBA-FLuc. First, we performed qPCR to quantitate the amount of each vector in the cell culture media using the ITR regions for probing. Titers of DNase-resistant AAV particles of AAV2-λ465 and AAV2-CBA-FLuc were not significantly different, suggesting successful packaging of the small genome into AAV2 capsids (AAV2-λ465: 7 × 1012 ± 3.3 × 1012 vg/mL (mean ± SD) and AAV2-CBA-FLuc: 7.4 × 1012 ± 6.6 × 1012 (mean ± SD), difference not significant by t-test). Next, we isolated and purified AAV-λ vectors on iodixanol gradients, and achieved DNase-resistant AAV particle titers of > 1012 vg/ml. We performed transmission electron microscopy (TEM) on AAV2-λ465 and AAV2-CBA-FLuc (Fig. 3b). AAV capsids were observed both in AAV2-λ465 and AAV2-CBA-FLuc samples. We also counted full vs. empty capsids on TEM images and observed that AAV2-λ465 has significantly higher empty capsids (28.8%) compared with AAV2-CBA-FLuc (16.1%); unpaired t-test, p = 0.0254, Fig. 3c). The DNase-resistant vector titers as well as TEM analysis of capsids suggests successful packaging of AAV2-λ465. Due to the small size of the λ cassette, we wondered if individual AAV2 capsids would package multiple miniaturized monomeric genomes. If there would be on average multiple genomes per capsid, one would expect lower number of capsids for a given amount of AAV genomes. To determine capsid/genomic copy ratio, we performed ELISA for AAV2 capsids and qPCR for AAV genomes. Capsid/genomic copy ratio was not significantly different for AAV2-λ465 and AAV2-CBA-FLuc at 109 vg/well, and was significantly increased (unpaired t-test, p = 0.026) in the case of AAV2-λ465 compared with AAV2-CBA-FLuc at 1010 vg/well (Supplementary Fig. 2). These results suggest that AAV2-λ465 on average does not harbor more than one monomeric genomic copy per capsid as compared with a full-length AAV. However, concatemeric AAV-λ465 genomes could potentially be packaged into a given capsid, which may not be detected by the ITR-specific qPCR used to titer the vector preparation, due to ITR recombination during concatamerization. To ascertain this possibility, we ran purified AAV2-λ465 or purified AAV2-Cas9 (full-length control vector) on an alkaline agarose gel and stained the gel to observe the size of the genomes. As expected the full-length AAV2-Cas9 vector ran at ~5 kb. For AAV2-λ465, a band migrating just below 0.5 kb indicating monomeric genomes was observed. Interestingly, additional bands between ~0.7 and 4 kb were observed indicating some packaged genomes were concatemers (Fig. 3d). A Southern blot using an ITR-specific probe confirmed the presence of monomeric and concatemeric genomes in the packaged AAV capsid (Fig. 3d). Importantly, we did not observe smaller bands in the case of AAV2-λ465, suggesting no significant fragmentation of the small AAV genome.

To precisely quantify and characterize AAV integration into nuclease-induced breaks in the U2-OS human cell line using AAV2-λ465 (or AAV2-FLuc for comparison), we supplied Cas9 and gRNA using transfected plasmids into cells after overnight transduction with the AAV vector. First, we analyzed indel formation and AAV integration in four target sites (Supplementary Table 1). All target sites that showed indel formation showed AAV capture as well, as assayed by NGS (Supplementary Table 1). We observed AAV capture rates between 3 and 38% in the case of AAV2-λ465 (Supplementary Table 1). Next, we developed an integration-specific qPCR assay with one primer being in the ITR and one primer being in the target gene (Fig. 3e). We selected the ITR region as the priming site as the ITR was present in the junction in the vast majority on fusion reads (Fig. 1f). We assessed the presence of ITR-genomic fusion events in the case of six different genes (Fig. 3e). There was no amplification from cells treated with Cas9 plasmid and AAV vector only (i.e., no gRNA) (Fig. 3e) indicating that a site-specific DSB is required for AAV integration. In contrast, the integration-specific qPCR assay showed integration in all target genes when cells were transfected with Cas9 and gRNA plasmids, and co-transduced by AAVs. There was no difference between the GAPDH-normalized ITR-genomic fusion events between the 465 bp AAV2-λ465 and the 4.1 kb AAV2-FLuc construct in any of the target genes analyzed (Fig. 3e).

Since we observed concatemeric genomes packaged in AAV2-λ465, we analyzed integration of concatemers in U2-OS cells using alkaline gel electrophoresis and Southern blotting. Using a probe specific for the ITR region, analysis of genomic DNA from U2-OS cells treated with Cas9/gRNA and AAV2-λ465 revealed integration into NGA FANCF site 1 and NGA FANCF site 3 (NGA is the PAM site of the SpCas9-VRQR PAM variant1), but there was no integration in the no gRNA control (Fig. 3d). The major integrant size was smaller than the AAV2-λ465 genome, but we observed larger bands (up to 3 kb) particularly in the case of NGA FANCF site 1, indicating concatemeric integration of AAV-λ465.

Next, to analyze if there is a preferred breakpoint in the ITR region we determined, for each nucleotide in the ITR region, how frequently that nucleotide was represented in fusion reads at the breakpoint (i.e., breakpoint would be adjacent to this nucleotide). A heatmap (Fig. 3f) revealed that there are preferential breakpoints in the “b” and “c” loop regions of ITR elements. Next, we analyzed AAV2-λ465 integration into the FANCF NGA sites 1 and 324 using a specific gRNA, to assess whether there were any regions of the ITR which are present more often at the indels. Interestingly, in both the “flip” and “flop” ITR conformations, all ITR regions were present at relatively similar read counts (Supplementary Fig. 3). We observed a high frequency representation of the ITR elements when plotting the coverage of AAV2-λ465 (Fig. 3g), although we also readily detected integration of the λ sequence. Finally, in order to precisely characterize integrants and indels at the same time, we cloned amplified PCR fragments into a vector and performed Sanger sequencing of 285 individual bacterial clones from both sides of the integrant. We were able to recover 20 clones with AAV vector sequences. Integration of the λ sequence was either full length or partial (Fig. 3h). We observed four major integration types at the NGA FANCF site 3 locus: (Type 1;2 clones recovered) both ITRs are present and the full λ payload is detectable; (Type 2;4 clones recovered) full AAV-λ cargo with one ITR and no second ITR; (Type 3; 5 clones recovered) ITR-only integrations; and (Type 4;9 clones recovered) one ITR with partial λ sequences (Fig. 3h and Supplementary Fig. 4). We did not detect any integrants lacking ITR sequences. The presence of microhomology regions (depicted as green nucleotides on Fig. 3h and Supplementary Fig. 4) were sometimes present, however, for several clones, no microhomology regions were observed. These data suggest that the ITR is required for integration, but only one ITR is needed for a successful capture. Furthermore, our data suggest that the majority of integration events do not contain the full-length AAV genome, however, full-length AAV integration events are also readily detected.