Papaya is an important fruit crop which is widely cultivated in the tropical and subtropical regions of South China and other countries worldwide. PRSV is the most widespread and damaging virus that infects papaya. The large-scale cultivation and commercialisation of antiviral transgenic papaya have been successful in Hawaii and South China for a long time17,20,26,27. However, recent studies have found that PRSV infects ‘Huanong No.1’ transgenic papaya plants in some regions of South China24, and the occurrence tendency increases gradually with time. To explain the loss of transgenic papaya resistance against PRSV in South China, we investigated the virus occurrence in ‘Huanong No.1’ papaya plantations in Guangdong and Hainan during 2012–2016, and found that transgenic papaya plants were respectively infected by two kinds of the viruses, PRSV and Papaya leaf-distortion mosaic virus (data not shown). This result implied that the genetic structure and population of PRSV in South China may have changed. Zhao et al.23 analysed 76 PRSV isolates collected from diseased papaya plants of all different papaya cultivars excluding ‘Huanong No.1’ in Hainan in 2010 and inoculated some isolates onto genetically modified papaya seedlings from Hawaii, Taiwan and Guangdong. Results showed that all of these isolates could cause classical symptoms on the transgenic plants at 15 days post-inoculation. In the present study, we inoculated 20 representative isolates from Guangdong and Hainan to transgenic ‘Huanong No.1’ seedlings and observed obvious symptoms at 15 days post-inoculation. These results indicate that the PRSV population in South China changed along with time and with the increased number of cultivated transgenic papaya.

According to the symptom characteristics of PRSV on C. pepo, Cai et al.18 classified the virus in four provinces of South China as four stains: Ys with yellow spots on leaves, Vb with gravy bands along veins, Sm with severe mosaic and Lc with leaf curl. In the present study, 20 representative isolates from Guangdong and Hainan were inoculated onto C. pepo, and the leaves exhibited irregular chlorosis along the leaf veins (Fig. 2). This symptom was obviously different from those caused by the four strains mentioned above. Furthermore, the CP sequence alignment between these strains (Ys, Vb and Sm) and the 133 isolates from Guangdong and Hainan showed that the identities of nucleotides were 86.6%–89.2%, 89.2%–92.4% and 89.2%–92.1%, respectively. This result implied that these 133 isolates were significantly different from Ys in South China but relatively similar to Vb and Sm. The amino acid sequences of 10 representative Guangdong and Hainan isolates were also compared with Ys, Vb and Sm (Fig. 3), and results showed several domain sites that had been changed. In the 10 Guangdong and Hainan isolates, three to four E or K were added in the sites of EK repeats as compared with Ys, Vb and Sm. Bateson et al.28 and Jain et al.29 analysed the CP-coding region of PRSV isolates from Vietnam and India, respectively, and found that the number of amino acids also changed in the ‘EK’ region. The EK region is an important component located on the outer surface of the CP30 protein and is closely related to the aphid transmission element31,32,33. Mulot et al.34 revealed that the membrane-bound Ephrin receptor (Eph) in Myzus persicae is a novel aphid protein which is involved in the transmission of the Turnip yellows virus (TuYV) and further confirmed that the minor capsid protein of TuYV, essential for aphid transmission, was able to bind the external domain of Eph in yeast. Therefore, the addition or deletion of amino acids in the ‘EK’ region may influence aphids on the transmission of PRSV, so that the original dominant strain cannot be effectively and continuously transmitted. After the virus evolves a new ‘EK’ region suitable for aphid transmission, the new isolates gradually disseminate and substitute for the original strains. In addition, the conserved DAG, WCIEN and QMKAAA domains in the CP region were also postulated to be associated with virus transmission by aphids30,31,33. Meanwhile, outside the ‘EK’ domain, we found that two amino acid residues (Thr 34 to Ala 38 , Thr 82 to Ser 86 ) of the 10 representative Guangdong and Hainan isolates were different from the Ys, Vb and Sm strains, 12 amino acids residues (11 at the C-terminal and 1 at the N-terminal) were different from the Ys strain and two sites (both at the N- terminal) were different between the Guangdong and Hainan isolates. The core region of CP is associated with viral assembly35,36, plasmodesmatal gating37 and cell-to-cell movement38. The N- and C-terminals of CP are related to the long-distance movement and virus systemic movement. A point mutation (Ser 47 to Pro, Asp 5 to Lys) in the N-terminal of CP alters the ability of Pea seed-borne mosaic virus or Tobacco vein mottling virus to infect Chenopodium quinoa39 or tobacco40, respectively. A single substitution (Ser 7 to Gly) at the CP N-terminus reduces virus accumulation at 10-fold but restores aphid transmissibility of Potato virus A41. In the present study, most of the variation sites were located at the N- or C-terminals of CP, indicating that their changes may lead to the differentiation of the concentrations or systemic movement capacity of PRSV isolates in papaya. In addition, Zamora et al.42 analysed the RNA binding mode in Potyvirus and revealed that mutations in the conserved Arg and Asp residues of the CP impaired in vitro assembly of the potyvirus and blocked the assembly and cell-to-cell movement of the potexvirus in plants. In present study, we speculated that the CP of PRSV might have RNA binding function. Thus, various patterns of conserved elements and the recombinations may affect the RNA binding function. Further studies should be conducted for verification.

Phylogenetic analysis of PRSV CP nucleotide sequences from South China and other countries were performed. Results showed that PRSV isolates were distinctly divided into three groups (Fig. 4). Almost all Guangdong and Hainan isolates were clustered into Group III, although they belonged to two subgroups based on geographical locations. However, Ys, Vb and Sm in South China were clustered into Group II, along with Asian isolates from Thailand, South Korea and China Taiwan. The genetic distance values between Groups III and II and between Groups III and I were 0.115 ± 0.008 and 0.115 ± 0.009, respectively, which were higher than those between Groups II and I (0.079 ± 0.006), indicating that Group III belongs to a new lineage. We suppose that the long-term cultivation of genetically modified papaya in Guangdong and Hainan may have led to the emergence of the new PRSV isolate lineage.

Recombination is an important factor to promote virus evolution, which can increase genomic biodiversity43, reduce mutations in specific genome sequences44 and restore genome integrity45. In addition, recombination enhances the virulence of the virus and extends its host ranges46. This process has been found in many Potyvirus species47,48,49,50. The recombination hotspot of PRSV is the P1 gene, followed by P3 and CI, HC-Pro and CP51. According to the two breakpoints of nucleotides (746–853 and 591–861), Mangrauthia et al.51 discovered two recombination types in the CP region. In present study, 36 (27.1%) of the 133 PRSV Guangdong and Hainan isolates showed clear recombination in the CP region. Two recombination types were also detected, but the recombination sites were respectively located in positions 829–411 and 854–423, which were different from those of the types above. The breakpoints found in the former51 are mainly located at the C-terminal of CP, whereas those in the present study are located at the N- and C-terminals of CP, respectively. We also compared the amino acid sequences of CP from the Guangdong and Hainan isolates and found that the variation sites were mostly located in the N- and C-terminals of CP. Thus, we supposed that the virus can reduce harmful mutations and maintain genome stability by variation or recombination of the CP region, which probably led to the rapid spread of new group populations.

In the present study, the CP genes of the PRSV Guangdong and Hainan isolates exhibited a high degree of genetic diversity. Three indices were used to test the population differentiation between the Guangdong and Hainan isolates based on the PRSV CP nucleotide sequences. K ST and F ST may measure the relative proportions of total genetic diversity attributable to among populations, and range from 0.00 to 1.00. A value of 1.00 for K ST or F ST indicates that populations are completely differentiated, while a value of 0.00 indicates the populations are identical52. K ST or F ST values between 0.15 or 0.25 indicate high population differentiation, and values greater than 0.25 indicate very high genetic differentiation among populations53. The Snn value may indicate the frequency of the most similar sequences in the same population, and Snn values close to 1.0 indicate that the population is highly differentiated while values near 0.5 54 indicate that the population is identical. In the current study, the K ST and F ST values were close to or higher than 0.15 and the Snn values were close to 1, suggesting that the Guangdong and Hainan isolates in this study composed a highly differentiated new population of PRSV in South China.

Positive selection of virus population may endow the virus more fitness to adapt a new hosts or environments, whereas rapid divergence driven by positive selection has been seldom demonstrated55. Similar to other viral genes, majority of the codons in the CP genes of the Guangdong and Hainan isolates in the present study were detected to be under the status of negative (purifying) selection. This result suggests that most of the codon mutations in the PRSV genome are detrimental and thus are easily eliminated by natural selection. In this case, the selection may come from living environment differences between the two regions, such as differences in papaya cultivars and climatic conditions. After a long-term tracking survey of transgenic cultivars in Guangdong and Hainan, we found that ‘Huanong No.1’ was the dominant cultivar grown in Guangdong, while more various cultivars from other countries and regions, including ‘Huanong No.1’, were grown in Hainan. Guangdong Province is located at 20° 13′N–25° 31′N, 109° 39′E–117° 19′E, while Hainan Island is located at 19° 20′N–20° 10′N, 108° 21′E–111° 03′E. The former belongs to subtropical and tropical monsoon climate with an annual average temperature of 19–23 °C, whereas the latter belongs to tropical monsoon climate with an annual average temperature of 23–25 °C. These two regions were separated by Qiongzhou Strait, resulting in a certain geographical isolation56. These differences may lead to the gradual differentiation of PRSV between the Guangdong and Hainan isolates and eventually induce those isolates to evolve into two subgroups.

Tajima57 has developed a statistical method for testing the neutral mutation hypothesis by using the average number of nucleotide differences and the number of segregating sites. If a population experiences a bottleneck or balance, Tajima’s D values are significantly higher than 0; if a population experiences a size expansion or directional selection, Tajima’s D values are significantly less than 0. Since both balance and directional selection fall into the category of positive selection, natural selection may be accepted as long as the Tajima’s D value deviates significantly from 0, whereas the null hypothesis that neutral selection cannot be rejected when the Tajima’s D value does not significantly deviate from 0. Fu and Li58 proposed Fu & Li’s D and Fu & Li’s F test for neutral selection in comparison with Tajima’s D test. The proposed method considers the availability of external branches so that a rooted tree can be constructed for a given set of DNA sequences. The number of external mutations is different from its neutral expected values in the presence of selection, whereas the number of internal mutations is only slightly affected by the presence of selection. In the current study, the results of neutrality tests of Tajima’s D, Fu & Li’s D and Fu & Li’s F of the isolates from South China showed that they were all negative. No significant difference was found between the Guangdong and Hainan isolates, indicating that these two population groups did not significantly deviate from the neutral evolution and may have evolved into two relatively stable populations.

In summary, we analysed and confirmed the population characteristics of PRSV isolates in South China by collecting transgenic ‘Huanong No.1’ papaya samples from Guangdong and Hainan during 2012–2016. These isolates were highly differentiated from the previously reported strains of South China and therefore have led to the formation of a new emergent lineage which can infect genetically engineered papaya grown widely in South China.