Choosing gene insertion targets in a model rice variety

Rice (Oryza sativa) is a staple food crop for more than half of the world’s population. To identify GSHs in rice, we conducted a mutant screen by analyzing the morphological records and the whole-genome sequencing data of a fast-neutron rice mutant collection in a model japonica rice variety with a short generation time24,25,26. From this screen, we identified five mutants carrying homozygous insertions or translocations (Supplementary Table 1), which do not exhibit visible morphological changes compared with the parental genotype. We verified the homozygous mutations in these mutants by PCR using primers flanking the corresponding mutation sites (Supplementary Fig. 1). Because the mutations do not incur any visible change in morphology, these five intergenic mutation sites (A, B, C, D, and E) were chosen as the candidate GSHs (Supplementary Table 1).

Using the CRISPR-PLANT guide RNA design platform27, we selected seven specific sites near the five candidate GSHs in the Kitaake rice genome28, and designed gRNAs (A, B, C, D1, D2, E1, and E2) targeting each of these sites (Supplementary Fig. 2a). To experimentally determine the ability of Cas9 to induce DSBs at each of the seven targets in vivo, we performed a T7 Endonuclease 1 (T7E1) assay29 in rice protoplasts transiently expressing Cas9 and each of the seven gRNA candidates. This assay quantifies the frequency of Cas9-induced mutations at each of the seven gRNAs targets, which reflects the efficiency of cleavage by Cas9 at these targets in vivo. Mutations occurred at targets A, B, and C at relatively high frequencies (Supplementary Fig. 2b), indicating that the gRNAs can target these sites.

Constructing a maker-free carotenoid cassette for insertion

Because of the valuable socio-economic impact conferred by the Golden Rice 2 (GR2) cassette, its availability, and the clear phenotype it confers30, we chose to modify this cassette and use it as the donor DNA to assess the efficiency of marker-free targeted insertion in rice. Rice varieties carrying the Golden Rice 1 (GR1) and the GR2 cassettes accumulate carotenoids in the grain30,31,32. The endosperm of GR1 and GR2 is golden in color30,31, compared with the white endosperm observed in most conventional rice varieties. Consumption of GR1 and GR2 is predicted to have a positive nutritional impact, especially in regions where rice is the major food source and Vitamin A deficiency is prevalent33. Using the Golden Gate Assembly method34, we generated a carotenoid cassette based on the published sequence of the GR2 cassette30. To reduce the size of the insert, the selectable marker gene and the T-DNA border sequences were not included in our modified cassette. The final 5.2 kb carotenoid cassette (Fig. 1a, Supplementary Data 1) consists of the coding sequences of the two carotenoid biosynthetic genes SSU-crtI and ZmPsy30, both driven by the endosperm-specific glutelin promoter30 isolated from Kitaake rice. SSU-crtI is a functional fusion of the DNA encoding the chloroplast transit peptide from the pea RUBISCO small subunit and the Erwinia uredovora carotenoid desaturase, whereas ZmPsy encodes a maize phytoene synthase. The nopaline synthase (nos) terminator (from Agrobacterium tumefaciens) was used for transcription termination in both genes.

Fig. 1: Scheme for targeted insertion of the carotenoid cassette. a Map of the donor plasmid pAcc-B with details of the carotenoid cassette (orange arrow). Red and blue arrows represent the homology arms. The two vertical green triangles mark the positions of the guide RNA B target sites. The nucleotide sequence of the donor plasmid is presented in Supplementary Data 1. Primers used to genotype the donor plasmid are marked on the map. b Map of the CRISPR plasmid pCam1300-CRISPR-B. Genes encoding Cas9p, gRNA-B, and hygromycin resistance (HygR) are represented by purple, green, and black arrows, respectively. The Cas9p module is shown in detail. Primers used to genotype Cas9p are marked on the map. c, Scheme for transformation, selection, and regeneration. Full size image

Delivery of the carotenoid cassette into rice at genomic targets

We assembled the donor plasmid pAcc-B (Fig. 1a, Supplementary Data 1), which contains the 5.2 kb carotenoid cassette. We added homology arms, which consist of 794 bp and 816 bp of Kitaake genomic sequence to the left and right of the Cas9 cleavage site at the gRNA B target (Target B), respectively. The homology arms were included to facilitate the possibility of homology-directed repair (HDR)35, a precise repair mechanism. We placed two gRNA B target sequences outside each homology arm on the donor plasmid to further enhance the chance of targeted insertion of the carotenoid cassette sequence, because linearized donor templates have been reported to increase HDR efficiency36,37. We hypothesized that these gRNA target sites would facilitate the release of the carotenoid cassette from the circular donor plasmid by Cas9, based on previous reports38,39.

We next constructed the CRISPR plasmid pCam1300-CRISPR-B, which consists of a Cas9p module40 with a Poaceae (the plant family of rice and other species) codon-optimized Cas9-coding sequence driven by the maize Ubiquitin 1 (Ubi1) promoter, and a gRNA B module driven by the promoter of the rice small nuclear RNA gene OsU641 (Fig. 1b). The Cas9p module also includes the nos terminator derived from Agrobacterium. A hygromycin resistance selectable marker gene is present on the backbone of pCam1300-CRISPR-B, which allows for subsequent selection of rice transformants carrying the Cas9-gRNA module using the herbicide hygromycin.

Equal mass of the donor plasmid pAcc-B and the CRISPR plasmid pCam1300-CRISPR-B were mixed and delivered by particle bombardment (Fig. 1c). We bombarded one hundred Kitaake rice embryogenic calli, and applied hygromycin to select for calli transformed with pCam1300-CRISPR-B. We regenerated 55 hygromycin-resistant plants (T0 generation).

Insertion of the carotenoid cassette occurred at Target B

We genotyped the 55 T0 individuals by PCR using primers 1F and 1R to check whether carotenoid cassette was inserted at Target B through HDR (Fig. 2a, Supplementary Fig. 3a). We observed a PCR band for T0 plant #1 at 2.6 kb, which exceeds the size of the predicted band by 0.8 kb, roughly the size of the left homology arm. We hypothesized that the left junction of this insertion may have occurred through non-homologous end joining (NHEJ), an alternative pathway to repair DSB with a higher frequency compared with HDR35. To test this, we performed additional PCR reactions on T0 plant #1 using primer pairs 1F + 1R and 2F + 3R (Supplementary Fig. 3b–c). A 2.6 kb fragment and a 5.2 kb fragment were amplified, respectively. This result suggests that the entire pAcc-B donor plasmid was integrated at Target B in T0 plant #1. Amplicons spanning both junctions of the insert were sequenced to confirm the insertion of the donor plasmid (Supplementary Fig. 3d). Because T0 plant #1 was sterile, we could not harvest seeds to further validate the nature of the insertion.

Fig. 2: Molecular characterization of the carotenoid cassette at Target B. a Diagrams showing the genomic region near Target B in Kitaake rice and the donor DNA. Gray lines represent plasmid backbone DNA while black lines represent Kitaake genomic DNA. The vertical green triangles mark the positions of the guide RNA B target sites. b Diagram of the inserted carotenoid cassette at Target B in T0 plants #11, 16, 17, 24, 28, 48, and 50. The junction sequences in all the seven plants are identical, as shown in the diagram. For convenience, only the sequencing chromatograms for T0 #48 are shown. The protospacer adjacent motif (PAM) of the original guide RNA B targets are highlighted in yellow. Full size image

To assess the possibility that a subset of the remaining T0 plants also harbored the insertion of the carotenoid cassette through NHEJ but in the opposite orientation, we genotyped the 55 T0 plants using primers 1F and 2F (Supplementary Fig. 4a). These primers amplified a 2.3 kb band in seven T0 plants (T0 #11, #16, #17, #24, #28, #48, and #50). Both insertion junctions in these seven plants were confirmed by additional PCR reactions using primer pairs 1F + 2F and 1R + 3R (Supplementary Fig. 4b). Based on these results, we predicted that the donor DNA in between the two gRNA B targets was inserted at Target B through NHEJ in these seven T0 plants (Supplementary Fig. 4b). By sequencing these amplicons, we found that the junctions of the inserts in these seven T0 plants are identical (Fig. 2b). The identity of the junctions suggests that these seven T0 plants are likely clonal derivatives of a single independent insertion, which we subsequently confirmed (see below).

We performed genetic segregation analysis of the T1 generation to obtain rice plants homozygous for the carotenoid cassette at Target B that lack the Cas9-gRNA module. We genotyped the progeny of 48A (tiller A from T0 #48) for Cas9 using primers Cas9p-Genotyping-F and nos-Terminator-R located in the Cas9p module (Fig. 1b). In parallel, to detect any potential off-target integration of the donor plasmid during particle bombardment in T0 #48, we performed PCR using the donor backbone-specific primer M13F, and the carotenoid cassette-specific primer 1R (Fig. 1a). In the T1 population, the presence of the Cas9-gRNA module and the backbone of pAcc-B are linked (Fig. 3), suggesting that pAcc-B and pCam1300-CRISPR-B co-integrated in the genome adjacent to each other in plant T0 #48. This result is consistent with the previously reported observation that multiple plasmids frequently co-integrate when delivered through particle bombardment42. We next screened the same T1 population for individuals homozygous for the carotenoid cassette at Target B using primers 1F and 3R (Fig. 2a). From these genetic analyses, we identified T1 individual 48A-7 as being homozygous for the inserted carotenoid cassette at Target B and free of the co-integrated CRISPR and donor plasmids (Fig. 3).

Fig. 3: Genetic segregation of the progeny of T0-48A. Genotyping the T1 progeny of T0-48A. The purpose of each PCR experiment and the genotyping primers used are shown to the left for each gel panel. a Primers Cas9p-Genotyping-F and nos-Terminator-R amplify a 534 bp DNA fragment in plants with the Cas9 module. b Primers M13F and 1R amplify a 1.8 kb DNA fragment in plants with the off-target insertion of the pAcc-B donor plasmid. c Primers 1F and 2F amplify a 2.3 kb DNA fragment in plants with the carotenoid cassette inserted at Target B. d Primers 1F and 3R amplify a 1.9 kb genomic DNA fragment in plants unless the carotenoid cassette at Target B is homozygous. The positions of the primers used are illustrated in Figs. 1a, b, 2a. Kitaake (K) was used as the negative control. The red triangle at the bottom marks 48A-7, which is free of the co-integrated CRISPR and donor plasmids. Source data are provided as a Source Data file. Full size image

To examine whether the full-length carotenoid cassette was inserted at Target B in 48A-7, we performed PCR using primers 1F and 3R (Fig. 2a) with extended elongation time. A fragment with the expected size of 8.8 kb was amplified (Supplementary Fig. 5a), indicating that the insert in T0 plant #48 at Target B consists of the full-length carotenoid cassette and both homology arms from the donor plasmid. Consistently, a Southern blot assay on the genomic DNA extracted from 48A-7 supports the presence of a single-copy insertion of the full-length carotenoid cassette and the homology arms at Target B (Supplementary Fig. 5b–d). We also carried out the whole-genome sequencing of 48A-7 and identified all the sequencing reads (151 bases in length each) that fully or partially match with the sequence of the donor plasmid pAcc-B. We tiled up these reads and reconstructed the sequence of the insert, which is consistent with the sequence of the donor DNA and the Sanger sequencing of the junction ends described in Fig. 2b. We did not detect any DNA sequence of the pAcc-B donor plasmid in the genome of 48A-7 besides Target B. Together, these results suggest that plant 48A-7 carries a single copy of the full-length carotenoid cassette at Target B.

To assess the occurrence of off-target mutations caused by Cas9 in the process, we further analyzed the whole-genome sequencing result for 48A-7. We used Cas-OFFinder43 to predict potential Cas9 off-target sites in the KitaakeX genome28 and identified ten candidate sites (Supplementary Table 3). Sequence analysis indicates that none of the ten predicted off-target sites is mutated in plant 48A-7 (Supplementary Table 3 and Supplementary Data 2). This is consistent with the previously reported absence of mutations at predicted Cas9 off-target sites in rice plants edited with CRISPR-Cas944. Together, these results indicate that DNA cleavage by Cas9 is highly specific to Target B in our experiment.

Rice plant 48A-7 accumulates β-carotene in the seed

Plant 48A-7 resembles the control plant Kitaake in plant stature and grain dimensions (Fig. 4a–d). The dehusked seeds derived from 48A-7 are golden in color, indicating the accumulation of carotenoids in the endosperm (Fig. 4e). Because the major carotenoid in the endosperm of GR2 is β-carotene30, we quantified the β-carotene content in the endosperm from 48A-7 using high-performance liquid chromatography (HPLC) (Supplementary Fig. 6). In the dehusked, polished seeds from 48A-7, the β-carotene content was 7.90 ± 0.19 μg g−1 dry weight (Supplementary Table 4), while no significant amount of β-carotene was detected in the dehusked, polished Kitaake seeds. The observed β-carotene content in 48A-7 is slightly lower than that of the GR2 transformation event GR2E in japonica rice variety Kaybonnet under greenhouse conditions (9.22 μg g−1 dry weight)30, and comparable to the higher end of the range of β-carotene content measured in field-grown indica rice variety PSB Rc82 (1.96–7.31 μg g−1 dry weight)32. The difference in the β-carotene content observed in these studies may be due to the differences in growth conditions, genomic positional effects, and/or post-harvest decay of the carotenoids45, the rate of which varies among cultivars46. The difference in endogenous carotenoid metabolic components among cultivars may have also contributed to the difference in the level of β-carotene accumulating in the endosperm47. Overall, rice plant 48A-7 accumulates a high level of β-carotene in the endosperm.

Fig. 4: Trait assessment of the homozygous carotenoid-enriched rice line 48A-7. a Morphology of the 70-day-old Kitaake and 48A-7 plants. b Grain length comparison between Kitaake and the progeny of 48A-7. c Grain width comparison between Kitaake and the progeny of 48A-7. d Dry grain weight of randomly picked seeds from Kitaake and the progeny of 48A-7 (n = 100). Horizontal bars represent the mean value. e Picture of 100 randomly picked dehusked seeds from Kitaake and the progeny of 48A-7. White scale bars represent 1 cm. The source data underlying Fig. 4d are provided as a Source Data file. Full size image

Obtaining homozygous marker-free carotenoid-enriched rice

Analysis of the whole-genome sequence of 48A-7 revealed a fragment of the CRISPR plasmid at an intergenic region on Chromosome 5, the insertion of which is likely caused by the particle bombardment process48. We genotyped multiple T0 and T1 plants for this insert by PCR using primers flanking the insertion site. A homozygous 2.4 kb insert was detected in 48A-7 (Supplementary Fig. 7). One copy of this insert was also detected in T0 plants #11, #16, #17, #24, #28, #48, and #50 (Supplementary Fig. 7). This result indicates that these seven T0 plants are most likely derived from a single transformation event, which carries the 2.4 kb fragment resulting from the particle bombardment process. The 2.4 kb insert is absent from T0 plant #1, which suggests that T0 plant #1 resulted from an independent transformation event (Supplementary Fig. 7). To remove the 2.4 kb insert from the homozygous carotenoid-enriched rice, we backcrossed the carotenoid-enriched rice line 48A-7 (maternal) with Kitaake (paternal). The resulting F1 plants were self-pollinated to generate a segregating F2 population. In the F2 generation, we identified two rice plants, 1–11 and 2–8, as being homozygous for the carotenoid cassette at Target B and free of the 2.4 kb insert (Fig. 5a, b). Seeds harvested from both plants are golden in color (Fig. 5c), indicating that both plants accumulate β-carotene in the endosperm. These marker-free carotenoid-enriched rice plants carry homozygous insertion of the carotenoid cassette at the intended genomic target.

Fig. 5: Removal of the 2.4 kb plasmid fragment from 48A-7 by backcross. a Checking the homozygosity of the carotenoid cassette inserted at Target B in the F2 individuals 1–11 and 2–8 from the backcross between 48A-7 and Kitaake. PCR primers 1F and 3R anneal to genomic positions flanking Target B, as shown in Fig. 2a, amplifying a 1.9 kb genomic DNA fragment unless the carotenoid cassette at Target B is homozygous. Kitaake and 48A-7 were used as the wild type and homozygous controls, respectively. b Detecting the presence of the 2.4 kb CRISPR plasmid fragment on Chromosome 5 by PCR. Primers Chr5-insert-flanking-L and Chr5-insert-flanking-R amplify a 446 bp DNA fragment when the plasmid fragment is absent, or a 2.8 kb DNA fragment when the plasmid fragment is present. Kitaake and 48A-7 were used as the wild type and homozygous controls, respectively. c Picture of 100 randomly picked dehusked seeds from Kitaake and the two F2 plants described in (a). The source data underlying Fig. 5a, b are provided as a Source Data file. Full size image

The observed β-carotene is a consequence of the carotenoid cassette at Target B

To confirm that the observed accumulation of β-carotene in the seeds is a consequence of the carotenoid cassette inserted at Target B, we performed a genetic co-segregation analysis. We harvested seeds from 48P-3, a sibling of 48A-7 hemizygous for the insertion at Target B (Supplementary Fig. 8a). A randomly selected tiller from 48P-3 yielded 13 white seeds and 38 golden seeds, which fits the Mendelian ratio of 1:3 for single-site genetic segregation. We randomly germinated eight of the white seeds and eight of the golden seeds and genotyped the seedlings for the presence of the carotenoid cassette at Target B by PCR. The golden seed color co-segregated with the presence of the carotenoid cassette at Target B (Supplementary Fig. 8b). This indicates that the β-carotene in the seeds from 48A-7 results from the targeted insertion of the carotenoid cassette at Target B.

Targeted insertion of the carotenoid cassette at a second genomic target

To test whether the method of targeted insertion described above can be applied to other chromosomal locations, and to assess the frequency of insertion of the donor DNA, we performed an additional round of co-bombardment experiment at a different target site, Target C. In this experiment, we cultivated each callus separately to prevent clonal propagation. We generated a CRISPR plasmid pCam1300-CRISPR-C and a donor plasmid pAcc-C (Supplementary Fig. 9 and Supplementary Data 3) and delivered them to rice calli as described in Fig. 1c. We regenerated 16 independent T0 events transformed with the CRISPR plasmid and found that one event, T0 plant #6, carries the insertion of the carotenoid cassette at Target C (based on PCR genotyping and Sanger sequencing of the PCR products (Supplementary Fig. 10)). The insertion occurred through non-homologous end joining, similar to the insertion at Target B observed for T0 plant #48 (Fig. 2b). The overall insertion frequency is 1/16 (6.25%), which represents the number of plants with the on-target insertion of the cassette divided by the total number of transgenic T0 plants carrying the Cas9-gRNA module.