Southern blot characterization of GR2E rice

HindIII and SphI both with unique restriction sites within the T-DNA (Supplementary Fig. 1, adapted from submitted GR2E-FFP (Food and Feed or for Processing) study reports) were used for determining the copy number of the introduced DNA within the GR2E genome. Hybridization of HindIII-digested genomic DNA from homozygous plants of GR2E in four genetic backgrounds (Kaybonnet, PSBRc82, BRRI dhan29, and IR64) with the Zmpsy1 or pmi probes resulted in the detection of a single fragment of ca. 7900 bp, and hybridization with the probe specific for SSU-crtI yielded a single ca. 7200 bp fragment (Supplementary Table 2, Supplementary Fig. 2, lanes 16–19, adapted from GR2E-FFP submitted study reports). Hybridization of SphI-digested GR2E genomic DNA with the Zmpsy1 or SSU-crtI probes gave a single fragment of ca. 6900 bp, and upon hybridization with the pmi probe a single fragment of ca. 5500 bp was observed (Supplementary Fig. 2, lanes 6–9, adapted from GR2E-FFP submitted study reports; Supplementary Table 2). Weak hybridization between the Zmpsy1 probe and sequences derived from the endogenous rice psy1 gene was detected for restriction enzyme digests of control Kaybonnet and GR2E DNA samples (Supplementary Fig. 2, panel A, adapted from GR2E-FFP submitted study reports). Hybridizing fragments of ca. 5600 bp and ca. 4900 bp were detected in Southern blots of SphI and HindIII-digested DNA samples, respectively. This was not an unexpected finding considering the high degree of sequence identity, ca. 83 percent, shared between the Zmpsy1 and Oryza sativa psy genes.

Southern blot analyses of AscI + XmaI-digested GR2E rice DNA were used to investigate the integrity of the T-DNA insert containing the Zmpsy1, SSU-crtI, and pmi gene cassettes. The T-DNA contains a single AscI restriction site located at position 199 and a single XmaI site at position 8946 (Supplementary Fig. 1, adapted from GR2E-FFP submitted study reports). Insertion of an intact copy of the T-DNA should thus result in the detection of an 8747 bp AscI + XmaI fragment with the Zmpsy1, SSU-crtI, and pmi probes. The results of Southern analyses (Supplementary Fig. 2, lanes 11–14, adapted from GR2E-FFP submitted study reports) demonstrate that the correct size fragment was detected with all of the hybridization probes (Supplementary Table 2).

Hybridizing fragments were not detected when backbone probes were tested against samples of AscI + XmaI-digested GR2E genomic DNA (Supplementary Fig. 3, lanes 6–8, panels A and B, adapted from GR2E-FFP submitted study reports), confirming the absence of plasmid backbone sequences. Positive control samples of wild-type Kaybonnet genomic DNA spiked with pSYN12424 plasmid DNA did result in detection of the expected-size 4349 bp fragment using backbone probe 5 (Supplementary Fig. 3, panel B, lane 3, adapted from GR2E-FFP submitted study reports) and two fragments of 1243 bp and 4349 bp, respectively, using the mixture of backbone probes 1–4 (Supplementary Fig. 3, panel A, lane 3, adapted from GR2E-FFP submitted study reports).

Thus, multiple Southern hybridization analyses clearly demonstrate the insertion of the T-DNA into a single site and the absence of sequences derived from the plasmid backbone.

Stability of the introduced trait across multiple generations

The stability of the inserted DNA across multiple generations was assessed by Southern blot analyses of genomic DNA samples prepared from a selfed generation of GR2E in Kaybonnet background (T n ) and three back-cross generations of GR2E (BC 3 F 5 , BC 4 F 3 , and BC 5 F 3 ) for each recurrent parents (BRRI dhan 29, IR64 and PSBRc82). Digestions with HindIII, SphI and AscI + XmaI were separated by gel electrophoresis and blots were probed with probes specific for Zmpsy1, SSU-crtI, or pmi genes, respectively (Fig. 1, adapted from GR2E-FFP submitted study reports). Single hybridizing fragments of ~7900 bp, ~6900 bp, or 8747 bp were detected using the Zmpsy1, SSU-crtI, or pmi probes, respectively, in corresponding blots of HindIII, SphI, or AscI + XmaI digests of genomic DNA from each generation of GR2E rice (Supplementary Table 3).

Figure 1 Samples of DNA from individual plants of event GR2E in Kaybonnet (Tn; lanes 7–8, as replicates), BRRI dhan 29 (BC 3 F 5 , lanes 9–10; BC 4 F 3 , lanes 15–16; and BC 5 F 3 , lanes 21–22), IR64 (BC 3 F 5 , lanes 11–12; BC 4 F 3 , lanes 17–18; and BC 5 F 3 , lanes 23–24), and PSB Rc82 (BC 3 F 5 , lanes 13–14; BC 4 F 3 , lanes 19–20; and BC 5 F 3 , lanes 25–26) germplasm backgrounds and negative control DNA from Kaybonnet rice (lanes 5–6) were subjected to Southern blot analysis. For this, 5 μg genomic DNA was digested with HindIII (panel A), SphI (panel B), or AscI plus XmaI (panel C) followed by agarose gel electrophoresis and transfer onto nylon membrane. Positive control samples consisted of negative control Kaybonnet rice containing either one (lane 3) or 0.2 (lane 4) copy equivalents of pSYN12424 plasmid DNA digested with SphI (panels A and B) or AscI + XmaI (panel C), Blots were hybridized with DIG-labelled probes for Zmpsy1 (panel A), pSSU-crtI (panel B), or pmi (panel C). Following washing, hybridized probes and DIG-labelled molecular weight markers VII (lanes 1 and 28) were visualized using a chemiluminescent detection. Lanes 2 and 27 were blank on all gels (adapted from GR2E-FFP submitted study reports). Full size image

Concentrations of total carotenoids were determined in grain samples collected from GR2E plants in Kaybonnet germplasm, the BC 3 F 5 generations in PSBRc82 and IR64 backgrounds, and the BC 4 F 3 and BC 5 F 3 generations in PSBRc82, IR64, and BRRI dhan 29 germplasm backgrounds (Table 1). Carotenoid accumulation in the endosperm was positively correlated with the presence of the T-DNA insert as previously established by Southern blot characterization of the same generations and germplasm backgrounds of GR2E rice. Some variation in the concentrations of total carotenoids was observed depending on the germplasm background, with Kaybonnet and BRRI dhan 29 GR2E attaining the highest levels.

Table 1 Concentrations of total carotenoids in different generation and germplasm backgrounds of GR2E rice. Full size table

Mendelian inheritance of the inserted DNA

The inheritance pattern of the T-DNA insert within GR2E rice was investigated using a polymerase chain reaction (PCR)-based zygosity test. Segregation of the insert within three segregating generations (BC 4 F 2 , BC 5 F 1 , and BC 5 F 2 ) in each of three genetic backgrounds was determined. Chi-square analysis resulted in no statistically significant differences between the observed and expected segregation ratios for the three segregating generations of GR2E in PSBRc82, BRRI dhan29, and IR64 genetic backgrounds (Supplementary Table 5).

Nucleotide sequence analysis of the inserted DNA and flanking regions

The nucleotide sequence of the plasmid T-DNA, together with preliminary sequence information from the 5′ and 3′ flanking genomic DNA, was used to design seven sets of oligonucleotide primers to amplify the insert and flanking regions from GR2E genomic DNA as seven individual overlapping fragments (Supplementary Fig. 4, adapted from GR2E-FFP submitted study reports).

In total, 12,772 bp of GR2E genomic sequence was determined, comprising 1,988 bp of the 5′ genomic border sequence, 1,788 bp of the 3′ genomic border sequence, and 8,996 bp of the inserted T-DNA (Supplementary Fig. 4, adapted from GR2E-FFP submitted study reports). The T-DNA in GR2E rice was found to have a 23 bp deletion at the right border end and an 11 bp deletion at the left border end, which is common for Agrobacterium-mediated transformation events20. All remaining sequence was intact and identical to that of the T-DNA region of plasmid pSYN12424.

Basic local alignment search tool searches using the 5′ and 3′ flanking region sequences as queries against the O. sativa (japonica cultivar-group, Nipponbare) genome (MSU Rice Genome Annotation Project Release 7) localized the T-DNA on chromosome 3 within the intergenic region between LOC_Os03g43980 (3′ proximal) and LOC_Os03g43990 (5′ proximal; Fig. 2, adapted from GR2E-FFP submitted study reports).

Figure 2 Map position is indicated according to the MSU Rice Genome Annotation Project Release 7 (Nipponbare). The locations of the LB and RB flanking sequences correspond to positions 24,698,762–24,700,549 and 24,700,565–24,702,552, respectively. The insertion of the pSYN12424 T-DNA was within an intergenic region between loci LOC_Os03g43980 and LOC_Os03g43990, and resulted in the deletion of 15 bp of host genomic DNA in addition to truncations of the LB and RB regions of 11 bp and 23 bp, respectively (adapted from GR2E-FFP submitted study reports). Full size image

To investigate the possibility of creating new ORFs as a consequence of the T-DNA insertion in GR2E, an open reading frame analysis was conducted to look for potential start-to-stop ORFs that spanned either the 5′ or 3′ junctional regions. This analysis examined each of three possible reading frames in both orientations (i.e., six possible reading frames in total) for potential ORFs capable of encoding sequences of 30 or more amino acids. An allergen usually contains at least two epitopes, each of which will be a minimum of approximately 15 amino acid residues long, in order that antibody binding could occur. This implies a lower size limit for protein allergens of approximately 30 amino acid residues21, although currently there is no consensus among scientist on such size limit. Two ORFs were identified, one in the reverse orientation that spanned the 5′ T-DNA insert–genomic DNA border (Supplementary Fig. 4, adapted from GR2E-FFP submitted study reports; ORF-1, 207 bp, 68 amino acids), and one in the forward orientation that spanned the 3′ T-DNA insert–genomic DNA border (Supplementary Fig. 4, adapted from GR2E-FFP submitted study reports; ORF-2, 240 bp, 79 amino acids).

To search for potential similarity to known toxins, the amino acid sequence of each ORF was queried against a toxin database using the FAST All sequence alignment tool 36 (FASTA36) to identify possible significant sequence similarity with known or potential toxins. An E-score criterion of 1 × 10−5 was used to identify sequences from the toxin database with potential for significant sequence similarity to the query sequences of each ORF. Typically, alignments between two sequences require an E-score of 1 × 10−5 or less to be considered to have sufficient sequence similarity to infer homology. The FASTA36 search resulted in no significant hits returned (Supplementary Table 6).

To assess the potential for allergenicity, the amino acid sequence of each ORF was compared to a peer-reviewed database of 2129 known and putative allergens and celiac protein sequences residing in the Food Allergy Research and Resource Program (FARRP) dataset version 19 at the University of Nebraska. A criteria of > 35% identity over any segment of 80 or more amino acids as an indication of possible cross-reactivity for allergens was adopted by the Codex12 as the primary sequence search criteria for use in flagging proteins that might be of some concern of cross-reactivity for genetically modified plants. No identity matches of greater than 35 percent over 80 residues were observed for either ORF-1 or ORF-2. Each query sequence was also evaluated for any eight contiguous identical amino acid matches to the allergens contained in the FARRP database. There were no eight contiguous identical amino acid matches observed for either ORF-1 or ORF-2 (Supplementary Table 6).

Novel protein expression

The tissue specificity of ZmPSY1, CRTI, and PMI expression was confirmed by immunoblot analysis of various tissues sampled from GR2E rice. The patterns of expression of proteins in GR2E tissues were consistent with the activity of the endosperm-specific rice GluA-2 promoter of the Zmpsy1 and crtI genes, and use of the constitutive maize polyubiquitin promoter for the pmi gene. Expression of ZmPSY1 and CRTI was detected only in milk, dough, and mature stage grain (Supplementary Fig. 5, lanes 3–5, panels B and D, adapted from GR2E-FFP submitted study reports) and not in samples of bran, hulls, leaf, stem, or root tissue. In comparison, PMI expression was detected in all rice tissues tested (Supplementary Fig. 5, lanes 3–10, panel F, adapted from GR2E-FFP submitted study reports).

In order to estimate potential human and animal dietary exposure to the ZmPSY1, CRTI, and PMI enzymes expressed in GR2E, the protein concentrations in plant tissues were determined by quantitative enzyme-linked immunosorbent assay (ELISA). Three replicated samples of grains (milky, dough, mature) and straw were collected from GR2E rice grown at four locations in the Philippines during two growing seasons in 2015–16. Expression of the ZmPSY1 and CRTI proteins in GR2E is driven by the endosperm-specific rice GluA-2 promoter and measurable concentrations of both these proteins were found in all grain developmental stages but not in stem tissue (straw; Supplementary Table 7). For each protein, the highest concentrations were measured in samples of dough-stage grain, ranging between ca. 308–359 ng/g fresh weight tissue (FWT) and between ca. 54–68 ng/g FWT for ZmPSY1 and CRTI, respectively, across both growing seasons. Across the four locations and two growing seasons, the highest concentrations of ZmPSY1 and CRTI measured in samples of mature grain were ca. 245 ng/g FWT and 30 ng/g FWT, respectively.

Concentrations of PMI protein were significantly higher than either ZmPSY1 or CRTI in samples from all grain growth stages (Fig. 3, adapted from GR2E-FFP submitted study reports), and were highest in dough-stage grain, averaging ca. 2015 ng/g FWT across the four locations over both growing seasons. The mean PMI concentration in mature GR2E rice grain samples was ca. 1282 ng/g FWT across both growing seasons (Supplementary Table 7). Since expression of the PMI protein was under control of the constitutive maize polyubiquitin promoter, it was also present in straw samples at concentrations ranging between 320–796 ng/g FWT depending on location and growing season. The average concentration of PMI protein in GR2E straw across both growing seasons was ca. 482 ng/g FWT.

Figure 3 Samples of GR2E grain were collected at different developmental stages [BBCH 75 (milk stage), BBCH 85 (dough stage), and BBCH 90 (mature stage)] from four locations over two growing seasons in 2015–16 and the concentrations of ZmPSY1, CRTI, and PMI were determined by quantitative ELISA and are given in ng g/fresh weight tissue (FWT). Values represent the mean concentration across locations and years for each protein, and the error bars represent the range of concentrations measured across locations over both growing seasons. In some cases, the size of the error bars was less than the symbol size used for plotting (adapted from GR2E-FFP submitted study reports). Full size image

Estimated human daily dietary exposure to ZmPSY1, CRTI, and PMI proteins

Two approaches were followed to obtain estimates of daily rice consumption. First, historic rice utilization data for the highest rice-consuming countries in Asia, in comparison with the United States, were obtained from the USDA Production Supply and Distribution database and converted to per capita utilization estimates using the FAOSTAT population database. These values for 2011–2015 are presented in Supplementary Table 8. Projected utilization values for the same countries were obtained from the International Rice Outlook: International Rice Base Projections22, also presented in Supplementary Table 8 and Supplementary Fig. 6 (adapted from GR2E-FFP submitted study reports).

Using the highest projected per capita rice utilization in Cambodia of 253 kg/yr and an estimated average adult body weight of 57.7 kg in Asia23, the maximum daily rice intake was calculated as shown in this equation:

$${\rm{Daily}}\,{\rm{Rice}}\,{\rm{Intake}}=\frac{253\,({\rm{kg}}/{\rm{yr}})}{365\times 57.7\,({\rm{kg}}\,{\rm{BW}})}{\rm{X}}\,1000\,({\rm{g}}/{\rm{kg}})=12.0\,({\rm{g}}/{\rm{kg}}\,{\rm{body}}\,{\rm{weight}})$$

The second approach utilized data from the Food and Agriculture Organization of the United Nations (FAO)/World Health Organization (WHO) Chronic Individual Food Consumption Database summary statistics (CIFOCOss) currently containing summary statistics of 37 surveys from 26 countries. The CIFOCOss was initially developed to be used by FAO/WHO scientific committees for dietary exposure assessment. Available data for Asian countries are shown in Supplementary Table 9. A further comparison of consumption data between Asian countries and selected European, African, and South American countries is shown in Supplementary Fig. 7 (adapted from GR2E-FFP submitted study reports).

Based upon consideration of both approaches, a value of 12.5 g/kg body weight was chosen as the upper limit of mean daily dietary intake of rice. This value was judged as sufficient to account for consumption by all population subgroups, including children. In deriving estimates of maximum potential daily dietary exposure to the ZmPSY1, CRTI, and PMI proteins expressed in GR2E rice, the following assumptions were used: (i) The mean daily dietary rice consumption is 12.5 g/kg body weight, (ii) 100% percent of the dietary rice intake is from GR2E rice and (iii) the grain concentrations of ZmPSY1, CRTI, and PMI used for estimation are the highest values measured. This is the case in samples of dough-stage grain collected from any individual trial site location in either 2015 or 2016. These concentrations were significantly higher than those measured in mature grain at harvest. Using these assumptions, the estimated maximum potential daily dietary exposure from GR2E rice to each novel protein is shown in Table 2. They are estimated to be ca. 4.5, 0.85, and 30 μg/kg body weight to ZmPSY1, CRTI, and PMI proteins, respectively.

Table 2 Estimated maximum potential daily dietary exposure to ZmPSY1, CRTI, and PMI. Full size table

Bioinformatic analysis of the ZmPSY1 and CRTI protein amino acid sequences

PSY plays a pivotal role in the carotenoid biosynthesis pathway as it catalyzes the first committed step and controls flux through the pathway24,25. Phytoene undergoes consecutive modifications such as desaturation reactions by carotene desaturases and cis-trans isomerization reactions to form all-trans-lycopene, which is cyclized to α- and β-carotene.

Potential identities between the ZmPSY1 query sequence and proteins in the allergen database were evaluated with the FASTA35 sequence alignment tool using the default parameters. A criteria of > 35% identity over any segment of 80 or more amino acids as an indication of possible cross-reactivity to allergens was adopted by the Codex12 as the primary sequence search criteria for use in flagging proteins that might be of some concern of cross-reactivity for genetically modified plants. No identity matches of > 35 percent over 80 residues were observed. Also, there were no instances of eight contiguous identical amino acid matches observed between the amino acid sequence of ZmPSY1 when compared with the sequences of known allergenic proteins. To search for similarity to known or potential toxins, the amino acid sequence of the ZmPSY1 was queried against a toxin database using the FASTA36 algorithm. The ZmPSY1 query sequence did not return any entries with E-score less than 1 × 10−5. Therefore, there are no sequence homology alerts for potential toxicity of the ZmPSY1 protein.

Potential identities between the CRTI query sequence and proteins in the allergen database were evaluated and no identity matches of > 35 percent over 80 residues were observed, nor were there any instances of eight contiguous identical amino acid matches observed between the CRTI amino acid sequence and sequences of known allergenic proteins. However, a search using the CRTI query sequence returned two protein accessions from the toxin database with an E-score less than 1 × 10−5. The two sequence alignments (Supplementary Fig. 8, adapted from GR2E-FFP submitted study reports) were to the conserved N-terminal FAD (flavin adenine dinucleotide) -binding regions of L-amino acid oxidase (LAAO) enzymes from two species of venomous snakes: Bungarus multicinctus (many-banded krait, also known as the Taiwanese krait or the Chinese krait) and B. fasciatus (banded krait). Homology of these proteins will be discussed later.

Rapid digestion of ZmPSY1 and CRTI in simulated gastric fluid (SGF)

Rapid gastric and intestinal digestion is known to be correlated to the allergenic potential of proteins26. The in vitro pepsin resistance of native i.e. enzymatically active ZmPSY1 protein was investigated. Samples were removed at the given stated time points and subjected to SDS-PAGE analysis. Following exposure to SGF-containing pepsin for 30 seconds, the earliest time point sampled during the digestion, no intact ZmPSY1 protein (ca. 42 kDa) was evident as assessed by either SDS-PAGE or western immunoblot analysis (Supplementary Fig. 9, lane 4 in panels A and B, respectively, adapted from GR2E-FFP submitted study reports). Faint, low molecular mass degradation products were visible by Coomassie staining in samples removed up to two minutes of digestion (Supplementary Fig. 9, lanes 4–6, panel A, adapted from GR2E-FFP submitted study reports), but not at later time points, and these were not detected in the western blot.

Similar results were obtained with CRTI. Following exposure to SGF containing pepsin for 30 seconds, the earliest time point sampled during the digestion, no intact CRTI protein was evident as assessed by either SDS-PAGE or western immunoblot analysis (Fig. 4, lane 4 in panels A and B, respectively, adapted from GR2E-FFP submitted study reports), and there was no evidence of stable lower molecular mass proteolytic fragments derived from CRTI.

Figure 4 Panels A and B: Samples of CRTI protein purified from recombinant E. coli (Lot No. M20454-02) were incubated in the presence of SGF pH 1.2 containing pepsin for 0 min (lane 2) and 0.5, 1, 2, 5, 10, 20, 30 or 60 min at 37 °C (lanes 4–11) and then analyzed by SDS-PAGE. Gels were either stained for protein with colloidal blue G250 (panel A) or subjected to western immunoblot analysis (panel B) using rabbit anti-CRTI immunoglobulin (1:1000) and horseradish peroxidase-conjugated goat anti-rabbit IgG followed by precipitating substrate development. Control samples included CRTI protein diluted in gastric control fluid without pepsin (lane 1) and SGF solution containing pepsin (lane 12). Molecular weight standards are shown in lane 3 (adapted from GR2E-FFP submitted study reports). Full size image

Heat stability of ZmPSY1 and CRTI protein

The thermal stability of the ZmPSY1 protein was evaluated by measuring enzymatic activity, i.e. the conversion geranylgeranyl diphosphate (GGPP) into 15-cis-phytoene, as monitored by HPLC analysis. The GGPP substrate was produced with GGPP-synthase from its precursor molecules dimethylallyl diphosphate (DMAPP) and isopentenyl diphosphate (IPP)24,27. Thermal instability, i.e. rapid denaturation greatly increases the chance for proteolytic cleavage, adding to the safety of the expressed proteins. Proteins that are labile to temperatures used in cooking and processing are likely to have negligible dietary exposure. The expressed purified ZmPSY1 catalyzed the production of 15-cis-phytoene from DMAPP and IPP, in the presence of active A. thaliana GGPP synthase, at the rate of ca. 28.4 pmol μg−1 min−1 under the assay conditions used (Fig. 5, adapted from GR2E-FFP submitted study reports). Enzyme activity was irreversibly destroyed upon heat treatment, with 50 percent loss of activity following pre-incubation at ca. 42 °C for 15 minutes and complete loss of activity at 50 °C for 15 minutes.

Figure 5 Individual samples of ZmPSY1 protein purified from recombinant E. coli (Lot No. M20452-05) were heated for 15 minutes at a designated temperature ranging from 30–65 °C. Following this treatment, enzymatic production of 15-cis-phytoene was measured by HPLC. Panel A shows enzymatic activity (pmol μg-1 min-1) versus pre-incubation temperature, where the values are means + /− standard deviation of two technical replicates. Panel B shows HPLC chromatograms (287 nm) of chloroform:methanol extracts of selected ZmPSY1 activity assays. The area under the phytoene peak (retention time = 15.7 min) was used to calculate phytoene concentration. Pre-incubation temperatures are indicated to the right of each chromatogram trace (adapted from GR2E-FFP submitted study reports). Full size image

The thermal stability of the CRTI protein was evaluated by measuring enzymatic activity using a spectrophotometric assay to monitor the conversion of liposome-incorporated 15-cis-phytoene to all-trans-lycopene according to assay conditions28. The purified CRTI was enzymatically active, catalyzing the conversion of liposome-incorporated phytoene to all-trans-lycopene at the rate of ca. 5.4 pmol all-trans-lycopene μg−1 min−1 under the assay conditions used (Supplementary Fig. 10, adapted from GR2E-FFP submitted study reports). Enzyme activity was irreversibly destroyed upon heat treatment, with 50 percent loss at ca. 51 °C for 15 minutes and complete loss of activity following pre-incubation at 55 °C for 15 minutes.

Lack of acute toxicity of CRTI protein

The potential for acute toxicity resulting from a single oral exposure to CRTI was investigated in mice. Groups of five male and five female mice were dosed orally by gavage with: formulation buffer; bovine serum albumin or purified microbial-expressed CRTI (100 mg/kg body weight actual dose), at a volume of 13.45 ml/kg body weight, administered in two separate doses approximately 4 hours apart on test day 1. All animals survived until the scheduled end of the study period on day 15 and there were no clinical signs (abnormal behavior, general appearance and mortality/moribundity) of toxicity observed during the test period, nor were any gross lesions found in the mice at necropsy. There were no treatment-related effects on body weights for male or female mice over the study duration and all mice experienced net weight gain by test day 15 compared with test day 1 (pre-fast).