A homology-independent approach for gene tagging

We designed a generic plasmid donor harbouring the tag of interest (Fig. 1a), which is flanked by two gRNA cleavage sites that correspond to a genomic locus in Zebrafish (tia1l) that is absent in human cells10. The plasmid donor also encodes a U6 promoter driving the expression of the tia1l gRNA. When cells are transfected with Cas9, the donor plasmid and a gRNA specific to the region of the gene where the tag should be incorporated, the tag will be released from the plasmid and subsequently integrated into the gene of interest.

Figure 1: Approach for homology-independent gene tagging. (a) Schematic representation: cells are transfected with Cas9, a gRNA specifying the desired locus in the human genome (here in exon 9 of the gene of interest) and a generic donor plasmid. The generic donor plasmid contains the tag of interest, flanked by two tia1l recognition sites, as well as a U6 promoter driving the expression of the tia1l gRNA. Consequently, the tag of interest will be released upon co-expression of Cas9 and spontaneously integrated at the site specified by the genomic gRNA. (b) HAP1 cells were transfected with expression plasmids for Cas9, the tia1l gRNA and the generic NanoLuc donor. For each gene under consideration, we chose two independent gene-specific gRNAs that were co-transfected. Genomic DNA was isolated from pools of cells 5 days post transfection and analysed by PCR. For this PCR, one constant primer binding to the NanoLuc cassette (5′-GGATCGGAGTTACGGACACC-3′) was combined with one variable primer for each gene of interest. HAP1 wild-type cells were included as a reference (lanes labelled with —). Numbers above each lane define the guide RNA identity, as specified in Fig. 2 and Table 1. Full size image

To assess whether the donor could be employed to create tagged alleles, we used the human near-haploid cell line HAP1 (refs 11, 12), because it contains a single copy of most genomic loci and is thus ideal to trace genomic-editing events. As an initial proof of concept, we picked six genes and selected two gRNAs targeting the 3′-coding region of each gene. We transfected HAP1 cells with the donor plasmid (Supplementary Fig. 1), expression plasmids for Cas9 and one gRNA specifying the genomic locus of interest and a plasmid encoding a blasticidin resistance gene. We eliminated untransfected cells by applying a short pulse of blasticidin. After selection, we assessed integration of the donor in the pool of transfected cells by a PCR strategy where one primer binds in the genome and another primer binds in the tagging cassette (Fig. 1b). Specific gene-tagging events were evidenced by the occurrence of a specific PCR product in transfected cells that is absent in the parental cell line. A total of 11 out of 12 targeting events showed integration of the donor cassette (Fig. 1b). The failed targeting event was not due to inefficient Cas9 cleavage, as all gRNAs showed comparable efficiencies when evaluated in a T7 endonuclease assay (Supplementary Fig. 2). These initial findings suggest that our approach led to the successful integration of the donor and that our strategy is applicable across diverse genomic loci.

Gene tagging is efficient and precise

To assess the frequency of integration events, we isolated clonal cell lines by limiting dilution from nine separate gRNA transfections and analysed clones by PCR and Sanger sequencing (Table 1). The results showed that we were able to retrieve clones bearing tagged alleles for five of the nine gRNAs. Importantly, in some cases, we were able to isolate multiple clones from a very moderate total number of 24 clones tested, indicating a high overall efficiency of the procedure.

Table 1 Identification of single-cell clones bearing NanoLuc integrations. Full size table

When analysing the sequences obtained from clonal cell lines bearing tagged alleles, we initially expected to see insertions or deletions as a consequence of the imprecise nature of NHEJ. In contrast, we observed that 9 out of 12 clones showed a perfect cleavage and ligation pattern (Fig. 2): in these clones, Cas9 had cleaved both gRNA recognition sites (the genome and the tia1l sequence) at the PAM −3 position and endogenous DNA ligases subsequently linked the two sequences without additional end processing. We noted that some clones show an identical mutation pattern (for example, 669-12 and 669-24). This indicates that there is either a dominant mutation pattern that could be favoured due to local sequence constraints or that these represent siblings from the same initial clone. Overall, the results indicate that this gene-tagging approach is remarkably precise and delivers clones in a predictable fashion.

Figure 2: Gene tagging is efficient and precise. Sanger sequencing data of clones described in Table 1 were analysed to identify the integration pattern. Dark-blue arrows indicate the directionality of the tia1l gRNA, light-blue arrows indicate the directionality of the genomic gRNA site. Red dots symbolize additional insertions or deletions identified by Sanger sequencing. Number on the left specify the clone ID. Full size image

Generation of NanoLuc reporter cell lines

As we found that integration of the reporter cassette preferentially occurred exactly at the PAM −3 position (Fig. 2), we constructed three versions of a NanoLuc donor plasmid for each possible reading frame (Supplementary Fig. 3) to take into account where in the reading frame Cas9 introduces the cut. Furthermore, we neither included a start nor a stop codon in the donor to accommodate fusion seamlessly anywhere within the coding sequence of a gene (Supplementary Fig. 3).

We now aimed to tag a set of genes with these altered NanoLuc reporter cassettes to allow for subsequent functional evaluation. NanoLuc13 was chosen, as it is small and bright and thus compatible with low signal intensities expected from a single genomic integration event. We decided to tag three cytokine-inducible genes (DACT1, IFIT1 and EGR1)—DACT1/Dapper1 is a Wnt antagonist that is regulated by activin A14, IFIT1 is an interferon-inducible gene15 and EGR1 is regulated in response to various stimuli including FGF1 (ref. 16). We initially confirmed upregulation of the corresponding messenger RNAs in response to cytokine stimulation by quantitative PCR (qPCR) in wild-type HAP1 cells (Fig. 3a). Next we chose one gRNA for each gene and selected the appropriate NanoLuc cassette bearing the reading frame that would create an in-frame integration event if Cas9 cleavage and ligation occurred as predicted. Up to 96 clones were screened by PCR and Sanger sequencing across the 5′ junction. In line with our previous findings (Fig. 2; Table 1), two to seven clones were positive as judged by the PCR across the 5′ junction. The majority of these contained in-frame integrations (Table 2; Supplementary Fig. 4). However, not all of the clones bearing correct 5′ junctions also contained correct 3′ junctions (Table 2). Overall, we were able to identify one clone containing the perfect integration pattern for both IFIT1 and EGR1. For DACT1, in contrast, all clones that bore the correct 5′ junction had imperfect 3′ junctions. However, since the tag was inserted at the 3′ end of DACT1, most of the DACT1-coding sequence was still unhampered. Thus, we decided to use one of the imperfect DACT1-NanoLuc clones for functional analysis.

Figure 3: Cell lines bearing NanoLuc integrations can be used to monitor changes in gene expression. (a) HAP1 cells were stimulated with various cytokines as indicated (IFN-β, activin A and FGF1) for 4, 8 or 24 h at a final concentration of 50 ng ml−1. RNA was isolated and analysed by qPCR for the following signature genes: IFIT1, DACT1 and EGR1. Error bars show the s.d. from three technical replicates. (b) Clonal cell lines bearing NanoLuc integrations in IFIT1, DACT1 and EGR1 (Table 2) were stimulated with IFN-β, activin A or FGF1 as indicated. Cell lines were collected after 24 h (for IFIT1 and DACT1) or after the indicated time points (for EGR1) and NanoLuc luciferase levels were measured. Error bars show the s.d. from six technical replicates. Full size image

Table 2 Identification of single-cell clones bearing NanoLuc integrations in correct reading frame. Full size table

Next, we tested the cell lines bearing defined NanoLuc integrations in DACT1, IFIT1 and EGR1 by stimulating them with their cognate cytokine ligands. All three tagged alleles showed cytokine-induced upregulation of NanoLuc levels as measured by the NanoGlo Luciferase assay system (Fig. 3b). Importantly, data obtained using NanoLuc measurements nicely reflected the qPCR data obtained previously (compare Fig. 3b with Fig. 3a), both with regard to the degree of cytokine stimulation and the kinetics. This suggests that NanoLuc reporter cell lines, engineered by our generic gene-tagging approach, can be used to faithfully monitor gene expression at the endogenous level.

Generation of TurboGFP reporter cell lines

We also assessed the feasibility of our approach for a different purpose: the tagging of endogenous genes with fluorescent markers suitable for live-cell imaging. We selected TurboGFP17 because it is very bright and photostable and allows the enrichment of cells bearing the tagged allele by fluorescence-activated cell sorting (FACS; Fig. 4a), which will increase the frequency by which genome-edited clones can be recovered. As outlined above for NanoLuc, we generated one cassette for each of the three reading frames to obtain three generic TurboGFP donors (Supplementary Fig. 5). We decided to tag three genes that display distinct subcellular localization patterns: LMNA encodes a component of the nuclear envelope (Lamin A)18. The TERF1 gene product TRF1 binds to telomeres and is known to display a dotted nuclear pattern19. LAMP1 encodes a lysosomal marker LAMP1 that is clustered in cytoplasmic aggregates20. For each of these genes, we selected a single gRNA targeting a region within the gene that is likely to tolerate a TurboGFP insertion (LMNA and TERF1 at the 5′ end of the coding sequence/N- terminus of the protein, LAMP1 at the 3′ end of the coding sequence/C terminus of the protein). HAP1 cells were then transfected with Cas9, a gRNA specifying the genomic locus and the appropriate TurboGFP donor plasmid. Following transfection, we observed that up to about 1% of the cells were positive for TurboGFP (Table 3). TurboGFP-positive cells were subsequently enriched by FACS and TurboGFP positivity increased roughly 50-fold in the sorted cell populations (Table 3). Single-cell clones were then isolated by limiting dilution. When analysing the 5′ junction for each of the genes, we observed high frequencies of cassette integration (2/19 for LMNA, 8/17 for TERF1 and 9/19 for LAMP1; Table 4; Supplementary Fig. 6). The majority of the clones that displayed the correct 5′ junction also showed the correct 3′ junction (Table 4). This is in line with the notion that FACS clearly enriched for cells bearing correct tagging events.

Figure 4: Cell lines bearing TurboGFP can be used to monitor subcellular localization. (a) HAP1 cells were transfected with expression plasmids for Cas9 and the generic TurboGFP donor that expresses the tia1l gRNA. In addition, we co-transfected one gRNA for each gene under consideration (LMNA, TERF1 and LAMP1). TurboGFP-positive cells were enriched by FACS. (b) Clonal cell lines bearing TurboGFP integrations in LMNA and TERF1 were fixed, stained with 4,6-diamidino-2-phenylindole (DAPI) and analysed by fluorescence microscopy. Full size image

Table 3 Cells bearing TurboGFP-tagged alleles can be enriched by FACS sorting. Full size table

Table 4 Identification of single-cell clones bearing in-frame integrations of TurboGFP. Full size table

Finally, we analysed correctly targeted TurboGFP clones by fluorescence microscopy. In all of the cases we assessed, the TurboGFP pattern was homogenous, as expected for a clonal cell line bearing a single genomic tagging event. Lamin A displayed nuclear staining in line with its localization at the nuclear membrane (Fig. 4b). In dividing cells that could be identified based on 4,6-diamidino-2-phenylindole staining, the distribution of Lamin A was more diffuse (Supplementary Fig. 7). This highlights the authenticity of the Lamin A localization pattern as Lamin A is relocalized during nuclear envelope breakdown. Localization of TRF1 was highly distinct with dotted clusters distributed over the entire nucleus (Fig. 4b). This is in agreement with TRF1 bound to telomeres. For LAMP1, the TurboGFP signal was more diffuse with occasional aggregates representing the formation of active lysosomes (Supplementary Fig. 8). In summary, these data suggest that cell lines bearing TurboGFP alleles can be used for imaging studies and that localization patterns observed accurately reflect localization of the endogenous gene products.

Finally, we wondered whether our approach would also be feasible in diploid human cells or whether it was confined to haploid cells that may harbour a different repertoire of DNA damage repair pathways. To this end, we transfected HEK293 cells with a gRNA targeting LMNA and the corresponding TurboGFP plasmid donor as described above and analysed the transfected pools by flow cytometry. Pools were positive for GFP and GFP positivity could be increased by FACS sorting (Supplementary Fig. 9A). Cassette integration could be detected by PCR in pools of FACS-sorted cells (Supplementary Fig. 9B). Single-cell clones bearing on-target integration events in LMNA were obtained at frequencies >50% (Supplementary Fig. 9C). Three cell lines derived from these single clones were confirmed to be GFP positive (Supplementary Fig. 9D). We also evaluated the copy number of TurboGFP in these three clonal cell lines (Supplementary Fig. 9E): two clones displayed a single integration event, while one clone showed a copy number of 2 for the TurboGFP cassette. These data suggest that biallelic targeting may be possible with our approach, while it does not rule out the possibility that the second TurboGFP integration event may have occurred at an off-target site. Overall, this set of experiments suggests that our gene-tagging approach is also feasible in other commonly used cell lines such as HEK293.