To evaluate the quality of our assemblies, we first aligned genome reads back to the assembled genomes. About 93% and 87% of the reads used in de novo assemblies were mapped back to the C. maxima and C. moschata genomes, respectively, in a proper paired-end relationship ( Supplemental Table 4 ), indicating that the majority of the genomes were captured in the assemblies. We then mapped RNA sequencing (RNA-Seq) reads to the assembly, and of all RNA-Seq reads derived from different tissues of C. maxima (including fruit, leaf, stem, and root), 93.8%–95.8% were mapped to the genome. Similarly, 86.2%–95.0% of C. moschata RNA-Seq reads were mapped ( Supplemental Table 5 ). Further evaluation using BUSCO () revealed that 97.0% and 97.3% of the core eukaryotic genes were covered by genomes of C. maxima and C. moschata, respectively, and 95.5% and 95.9% were completely covered ( Supplemental Table 6 ). In summary, the high coverage of the genome and RNA-Seq reads as well as the core eukaryotic genes indicated the high quality of the assembled C. maxima and C. moschata genomes.

We constructed a C. maxima intraspecific genetic map containing 2030 SNPs spanning 3008.2 cM across 20 linkage groups (LGs) and a more saturated interspecific map (13 779 SNPs, 4053.9 cM, 20 LGs) ( Supplemental Table 2 ). Both the intraspecific and interspecific maps and a previous map constructed using the same intraspecific population () were largely collinear, with all 20 LGs having shared scaffolds. Using the genetic maps of C. maxima (both the intra- and interspecific), 92 scaffolds (211.4 Mb, 77.9% of the assembly) were anchored to the 20 LGs, among which 64 scaffolds (201.2 Mb, 74.14% of the assembly) were oriented ( Supplemental Table 3 and Supplemental Figure 3 ). Likewise, we also constructed a C. moschata intraspecific map that contained 3487 SNPs grouped into 20 LGs with a total genetic distance of 3376.1 cM, and an interspecific map that had 16 214 SNPs distributed in 20 LGs and an overall genetic distance of 2233.5 cM ( Supplemental Table 2 ). Based on these two genetic maps, 98 C. moschata scaffolds (238.5 Mb, 88.4% of the assembly) were anchored and 71 scaffolds (221.3 Mb, 82.0%) were oriented ( Supplemental Table 3 and Supplemental Figure 4 ).

The genomes of C. maxima cv. Rimu and C. moschata cv. Rifu, the two parents of the “Shintosa” rootstock, were sequenced and assembled. A total of 109.3 and 80.1 Gb of high-quality cleaned Illumina paired-end and mate-pair reads were generated for C. maxima and C. moschata, respectively, representing 283× and 215× coverage of their genomes ( Supplemental Table 1 ). Based on the 17-mer depth distribution analyses of the sequenced reads ( Supplemental Figure 1 ), the genome sizes of C. maxima and C. moschata were estimated to be 386.8 and 372.0 Mb, respectively. De novo assemblies resulted in a draft genome of 271.4 Mb for C. maxima and 269.9 Mb for C. moschata ( Figure 1 and Supplemental Figure 2 ). The assembled C. maxima genome had scaffold and contig N50 sizes of 3.7 Mb and 40.7 kb, respectively, and the N50 sizes for the scaffolds and contigs of the C. moschata assembly were 4.0 Mb and 40.5 kb, respectively ( Table 1 ).

(A) Ideogram of the 20 C. maxima pseudochromosomes (on Mb scale). Syntenic blocks assigned to subgenomes A and B, and those that could not be assigned are labeled in red, blue, and gray, respectively.

We predicted 32 076 and 32 205 protein-coding genes in the C. maxima and C. moschata genomes, respectively, much higher than numbers of genes predicted in genomes of cucumber (23 248;), melon (27 427;), and watermelon (23 440;). Biological functions were assigned to 26 993 (84.2%) genes in C. maxima and 27 230 (84.6%) in C. moschata. A total of 2414 and 2336 transcription factors were identified in C. maxima and C. moschata, respectively, numbers that are close to double those in the other sequenced cucurbit genomes ( Supplemental Table 8 ). We identified 30 and 57 disease-resistance genes encoding nucleotide-binding site leucine-rich repeat (NBS-LRR) proteins in C. maxima and C. moschata, respectively ( Supplemental Table 9 ). Low copy number of disease-resistance genes is known for other cucurbit genomes (). Orthology and synteny analyses between C. maxima and C. moschata showed that most C. maxima NBS-LRR genes had at least one ortholog in C. moschata, while certain genes were unique to C. moschata ( Supplemental Table 9 ).

Repeat sequences accounted for more than 40% of both C. maxima and C. moschata genome assemblies ( Supplemental Table 7 ). Among these repeats, 69.9% in C. maxima and 62.9% in C. moschata were annotated as long terminal repeat (LTR) retrotransposons, predominantly the copia- and gypsy-type LTRs. Similar findings have been reported in the genomes of other species in the Cucurbitaceae, including cucumber (), melon (), and watermelon ().

Based on these findings, we separated the entire chromosomes into two groups to represent the two paleo-subgenomes. Eight and 11 chromosomes were assigned to subgenomes A and B, respectively. Chromosome 4 was divided into three segments with two assigned to subgenome A and one to subgenome B ( Supplemental Figure 11 ). The reconstructed subgenomes A and B of C. maxima contained 14 816 and 16 068 protein-coding genes, respectively (15 136 and 16 473 genes in C. moschata, respectively). Phylogenetic analysis clearly indicated an allotetraploid origin of Cucurbita from two progenitors that diverged from one another approximately 30.75 million years ago (Mya) ( Figure 2 C). One ancestor gave rise to watermelon, cucumber, and melon in the Benincaseae tribe, as well as to the B-genome progenitor of Cucurbita, which has not been sampled and may be extinct. No extant diploid species that could represent the A-genome progenitor have been sampled, so it is possible that this genome donor is also extinct. The oldest possible date for the polyploidy event is the date of divergence of Benincaseae from the B-genome of Cucurbita (approximately 26.28 Mya, Figure 2 C). There was extensive shared synteny between the two Cucurbita species with only a few inversions ( Supplemental Figure 12 ), consistent with a shared polyploidy event. The divergence date of the two Cucurbita species (3.04–3.84 Mya; Figure 2 C) provides a minimum age for the allopolyploidy event. There was no significant difference of Ks values between C. maxima–C. moschata subgenome A and subgenome B orthologous gene pairs ( Supplemental Figure 13 ), suggesting that the two subgenomes evolved with similar mutation rates after polyploidization. Furthermore, the distribution of Cucurbita-Benincaseae syntenic blocks with higher (or lower) Ks along an entire chromosome (except for chromosome 4) ( Figure 2 B) indicated a low level of large-scale interchromosomal exchanges since the allotetraploidization event, which suggested karyotype stability in the polyploid Cucurbita genomes. This was further strongly supported by the fact that the same ancestral states of four chromosomes from progenitor B (Cucurbita chromosomes 3, 10, 13, and 15) were also found in the modern melon genome ( Figure 2 B).

Whole-genome synteny analysis revealed pairs of C. maxima (or C. moschata) homoeologous regions shared between chromosomes corresponding to two subgenomes ( Figure 1 and Supplemental Figure 5 ). Unlike the three subgenomes of Brassica rapa, which cannot be differentiated based on the Arabidopsis thaliana–B. rapa Ks values (), the two subgenomes of Cucurbita could be distinguished according to their different genetic distances to their orthologs in the Benincaseae species, including melon, cucumber, and watermelon ( Figure 2 B and Supplemental Figures 7–10 ). About 63.2%, 58.7%, and 68.3% of the C. maxima genomic regions (64.0%, 62.2%, and 69.5% for C. moschata) syntenic to melon, cucumber, and watermelon, respectively, were composed of homoeologous blocks with significantly different Cucurbita-Benincaseae Ks values. Except for chromosome 4, all other chromosomes contained only homoeologous blocks with either higher or lower Ks values ( Figure 2 B and Supplemental Table 10 ).

We identified 195 paralogous syntenic blocks (7773 homoeologous gene pairs) in C. maxima and 170 blocks (7742 gene pairs) in C. moschata, with synonymous substitution rate (Ks) values peaking at 0.32 ( Figure 2 A and Supplemental Figures 5 and 6 ), collectively covering 92.5% and 94.1% of the gene space, respectively. These homoeologous pairs clearly indicated that both C. maxima and C. moschata genomes underwent a whole-genome duplication (WGD) event that was also reported in another Cucurbita species, C. pepo (), but not observed in other sequenced cucurbits, including cucumber (), melon (), watermelon (), and bitter gourd (Momordica charantia) () ( Figure 2 A).

(C) Phylogenetic tree and timing of the speciation events. Estimated divergence times are shown at the nodes. The 95% highest probability density intervals are indicated as node bars and numbers in green are listed below.

(B) Comparison between the genomes of C. maxima and melon. Two syntenic regions in C. maxima corresponding to one melon region with different C. maxima-watermelon Ks values are marked by different colors.

(A) Distribution of Ks between orthologous or paralogous genes in the genomes of C. maxima (Cma; subgenomes A and B, CmaA and CmaB), melon (Cucumis melo, Cme), and bitter gourd (Momordica charantia, Mch). WGD, whole-genome duplication.

De-novo assembly of zucchini genome reveals a whole genome duplication associated with the origin of the Cucurbita genus.

Two Distinguishable Paleo-Subgenomes with Nearly Intact Progenitor Chromosome Structures in C. maxima and C. moschata

Genome and Gene Evolution after Allotetraploidization

A total of 6550 genes in C. maxima (6639 in C. moschata) were assigned to 348 KEGG pathways, as expected, most of which contained similar numbers of genes from the two subgenomes. However, we did find that pathways of pentose and glucuronate interconversions, galactose metabolism, and starch and sucrose metabolism were composed of more subgenome B genes, and phenylalanine metabolism had more subgenome A genes ( Supplemental Table 12 ). The biased numbers were due to more genes from subgenome B encoding four enzymes, polygalacturonase (K01184), β-glucosidase (K01188), β-fructofuranosidase (K01193), and galacturan 1,4-α-galacturonidase (K01213) shared by the three carbohydrate metabolism pathways, and more genes from subgenome A encoding caffeoyl-CoA O-methyltransferase (K00588), tyrosine aminotransferase (K00815), and phenylalanine ammonia-lyase (K10775) in phenylalanine metabolism ( Supplemental Table 12 Supplemental Figures 15 and 16 ).

Schnable et al., 2012 Schnable J.C.

Wang X.

Pires J.C.

Freeling M. Escape from preferential retention following repeated whole genome duplications in plants. Hollister and Gaut, 2009 Hollister J.D.

Gaut B.S. Epigenetic silencing of transposable elements: a trade-off between reduced transposition and deleterious effects on neighboring gene expression. Schnable et al., 2011 Schnable J.C.

Springer N.M.

Freeling M. Differentiation of the maize subgenomes by genome dominance and both ancient and ongoing gene loss. Freeling et al., 2012 Freeling M.

Woodhouse M.R.

Subramaniam S.

Turco G.

Lisch D.

Schnable J.C. Fractionation mutagenesis and similar consequences of mechanisms removing dispensable or less-expressed DNA in plants. Woodhouse et al., 2014 Woodhouse M.R.

Cheng F.

Pires J.C.

Lisch D.

Freeling M.

Wang X. Origin, inheritance, and gene regulatory consequences of genome dominance in polyploids. We next tested the presence or absence of genome dominance by comparing expression levels between pairs of high-confidence duplicated genes. Using RNA-Seq data from fruit, leaf, stem, and root, we found no evidence of overall biased expression between the two subgenomes ( Supplemental Figure 17 ). This result should be expected given the lack of biased fractionation, and given that bias in gene expression is thought to be responsible for biased homoeologous gene loss (). Epigenetic modification of transposable elements (TEs) is known to suppress the expression of nearby genes by repressing the function of promoter sequences (). The most widely accepted model for biased fractionation is that it is mainly caused by differences in TE number and distribution between the diploid progenitors of an allopolyploid, especially TEs located in the promoter regions of genes that could lower their expression (). After polyploidization, for genes not retained by selection, the copy of a gene that is less expressed is more likely to be dispensable. Therefore, across the entire genome, the subgenome with more genes with TEs in their promoters will experience greater loss, leading to genome-wide bias in gene retention. For example, in B. rapa, the subgenome with a lower transposon load expresses genes to higher levels (genome dominance) and lost fewer genes (less fractionated) (). We compared TE insertions between the subgenomes of C. maxima (and C. moschata) within 2 kb upstream of genes. We found no difference in the numbers of genes having TE inserted in this region between subgenomes of C. maxima (and C. moschata) ( Supplemental Figure 18 A and 18B). In addition, the distribution of the TEs in the promoter regions showed similar patterns in the two subgenomes ( Supplemental Figure 18 C and 18D). These findings suggested that when the polyploidization happened, the diploid parental genomes of Cucurbita had a similar total load of TE insertions near genes, leading to the lack of genome dominance and subsequent random fractionation in the polyploid.

Liu et al., 2014 Liu D.

Sun W.

Yuan Y.

Zhang N.

Hayward A.

Liu Y.

et al. Phylogenetic analyses provide the first insights into the evolution of OVATE family proteins in land plants. Robust evidence of positive selection was detected between 49 pairs of duplicated genes in C. maxima and also 49 pairs in C. moschata ( Supplemental Table 14 ), which might have acquired new functions important for adaptation and/or increased Cucurbita fruit morphological complexity. Among these genes, 23 pairs were common between C. maxima and C. moschata, suggesting functional diversification of some homoeologous genes before the speciation of C. maxima and C. moschata. These genes included four in each of the C. maxima and C. moschata genomes encoding OVATE family proteins, whose expansion and functional divergence are present in many plant species, and members of this family are key regulators of fruit shape in tomato and pepper (), consistent with the highly diverse fruit shapes of Cucurbita species.