Genome resequencing and variation calling

A total of 117 Malus accessions from 24 species were selected for genome sequencing, including 35 M. domestica (24 scion and 11 rootstock cultivars), 10 M. sylvestris, 29 M. sieversii, 9 M. robusta, 6 M. baccata, 4 M. asiatica, 4 M. hupehensis, and 20 in the remaining 17 wild species with one or two accessions per species (Supplementary Data 1). Among the 29 M. sieversii accessions, 15 originated in Kazakhstan, on the west side of the Tian Shan Mountains, and 14 were collected from natural forests in Xinjiang of China, on the east side of the Tian Shan Mountains. Six out of the 24 species are native to China, four native to North America, and two to Europe, while some Malus species are considered as intrageneric hybrids, such as M. robusta and M. asiatica (Supplementary Data 2). Chinese soft apples, such as “Pinpo”, “Xiangguo”, M. asiatica and M. prunifolia have been cultivated as dessert apples for more than 2000 years in China (Supplementary Note 1). Analysis of the phenotypical data recorded in the USDA-GRIN database (https://npgsweb.ars-grin.gov/) indicated that domesticated apples are significantly bigger, firmer, and sweeter than M. sieversii in Kazakhstan, while less acidulated and much bigger than M. sylvestris (Supplementary Fig. 1).

Resequencing of the 117 apple genomes generated a total of 1060 Gb high-quality cleaned sequences, with an average of 9.06 Gb per accession that represented ~12.2× of the apple genome (Supplementary Data 1). After aligning the reads to the pseudo-haplotype apple genome12 (v1.0p), we identified a final set of 7,218,060 single nucleotide polymorphisms (SNPs) (Supplementary Note 2 and Supplementary Tables 1 and 2). Furthermore, we identified 431,597 small insertions and deletions (indels). Polymerase chain reaction (PCR) amplification and Sanger sequencing on genomic regions containing 958 randomly selected SNP loci in six apple accessions indicated a high accuracy rate (98.1%) for our genotype calling.

An apple evolutionary map

We first examined the phylogeny among wild and cultivated apples by constructing a neighbor-joining phylogenetic tree with pear as the outgroup using SNPs at fourfold degenerate sites (4D SNPs). The tree showed that accessions of M. domestica and its introgression contributor M. sylvestris formed a subclade within a large mixed clade comprising M. sieversii accessions that are considered as progenitors of cultivated apples8, 9, while accessions of other wild species position outside this domestication-related clade (Fig. 1a). Wild species native to North America (M. ioensis, M. angustifolia, M. fusca, and M. coronaria) are the closest to pear, followed by Asian wild species M. baccata and M. hupehensis. Notably, the recent introgression from M. sylvestris into M. domestica has been so intensive that the cultivated apples now appear to be closer to European crabapple M. sylvestris than to their progenitor M. sieversii, which is consistent with a previous report9. A principal component analysis (PCA) illustrated a similar pattern to the phylogenetic tree in that M. domestica, M. sieversii, and M. sylvestris accessions formed closely related clusters that were clearly separated from the dispersed accessions of other wild species (Fig. 1b).

Fig. 1 Population structure of 117 domesticated and wild apples. a Neighbor-joining phylogenetic tree constructed using SNPs at fourfold degenerate sites. Each species group is color coded, with red squares representing rootstocks and red dots scions. b Principal component analysis (PCA) of the 117 apple accessions. c Bayesian model-based clustering of the 117 apple accessions with the number of ancestry kinship (K) from 3 to 5. Each vertical bar represents one apple accession and the x axis shows different apple accessions. Each color represents one putative ancestral background and the y axis quantifies ancestry membership. Asi M. asiatica, Bac M. baccata, Dom M. domestica, Hup M. hupehensis, Other other wild species, Rob M. robusta, Sie_K M. sieversii in Kazakhstan, Sie_X M. sieversii in Xinjiang, Syl M. sylvestris Full size image

To further understand the evolutionary history of apple, we used a Bayesian clustering algorithm with admixed models13 to estimate ancestry proportions for each accession (Fig. 1c). Our ΔK analysis revealed that five populations (K = 5) represent the best model for these 117 accessions (Supplementary Fig. 2 ). For K = 3, M. domestica and its wild relatives, M. sieversii and M. sylvestris, were clearly separated from other wild species, supporting the evolutionary history of domesticated apple8, 9, 14. With K from 4 to 5, two new subpopulations arose from the wild species other than M. sieversii and M. sylvestris, indicating their high diversity and further distance from domesticated apples. M. sieversii accessions from the two sides of the Tian Shan Mountains segregated into two different subpopulations reflecting their geographical distributions. M. sieversii accessions in Kazakhstan showed admixed ancestry possibly from hybridizations with wild apples such as M. orientalis along the Silk Road, and/or domesticated apples cultivated nearby, while Xinjiang accessions kept their homogeneous genetic background probably due to their geographical isolation that blocks interspecific hybridization. In addition to M. sieversii in Xinjiang, six other species in distinct habitats, such as M. ioensis and M. angustifolia in North America and M. baccata in East Asia, were also identified with homogeneous genetic background, which gave rise to other hybrid species and possess tremendous value in apple breeding practices (Supplementary Fig. 3). The structure of several hybrid species was consistent with their known pedigrees recorded in the USDA-GRIN database, including M. asiatica, M. prunifolia, M. robusta, and several rootstocks, while that of M. platycarpa did not agree with its pedigree in the database (Supplementary Figs. 4 and 5 and Supplementary Note 3). Together, these findings prompt us to propose a comprehensive apple evolutionary map across Eurasia continent, illustrating the initial domestication from M. sieversii in Kazakhstan, the hybridization between M. sylvestris and the ancient domesticated apples spread from Central Asia to Europe via the Silk Road westward, and the rise of orient hybrid species from crosses between M. baccata and M. sieversii in Kazakhstan distributed and cultivated along the Silk Road eastward (Fig. 2a and Supplementary Note 4). During the domestication process, cultivated apples retained the large fruit size from M. sieversii, gained the firm texture and appetizing flavor from the hybridization with M. sylvestris and continued to be bred into larger and firmer fruit with better flavor and aroma.

Fig. 2 Apple evolutionary map. a Apple evolutionary map along the west and east bounds of the Silk Route with center of origin at Kazakhstan in central Asia. b Decay of linkage disequilibrium (LD) measured as the squared correlation coefficient (r 2) by pairwise physical distance in M. domestica, M. sieversii in Kazakhstan, M. sieversii in Xinjiang, M. sylvestris, and other wild species. c Multidimensional scaling (MDS) plot for the pairwise F ST matrix. The Euclidean distances between each pair of groups significantly represent the corresponding F ST values (Spearman rank-sum correlation ρ = 0.95; p < 10−14). d Major alleles of scion and rootstock cultivars derived from M. sieversii in Kazakhstan and M. sylvestris. Asi M. asiatica, Bac M. baccata, Dom M. domestica, Hup M. hupehensis, Rob M. robusta, Sie_K M. sieversii in Kazakhstan, Sie_X M. sieversii in Xinjiang, Syl M. sylvestris Full size image

We then evaluated the genetic diversity of different apple subpopulations. The genome-wide nucleotide diversity (π) of M. domestica (2.20 × 10−3) was lower than that of M. sieversii in Kazakhstan (2.35 × 10−3), M. sylvestris (2.55 × 10−3), and other wild species (4.26 × 10−3), while M. sieversii in Xinjiang exhibited the lowest diversity level (1.30 × 10−3). Compared to other domesticated perennial crops, the nucleotide diversity in cultivated apple is higher than that of peach2 (1.5 × 10−3), but lower than that of cassava15 (2.6 × 10−3) and date palm3 (9.2 × 10−3). The similar level of nucleotide diversity between M. domestica and its progenitor M. sieversii in Kazakhstan indicated a very weak bottleneck, if any, during apple domestication, consistent with findings in previous studies8, 9. Linkage disequilibrium (LD) analyses for each group further supported a very weak and nearly undetectable domestication bottleneck (Fig. 2b and Supplementary Note 5). The rapid LD decay of domesticated apple suggested that a large set of markers densely covering the genome is preferred for a high-resolution population genetic analysis, as indicated in a recent study16. Here we also demonstrated the power of our high-density SNPs in enhancing the resolution of genome-wide association studies (GWAS) (Supplementary Note 6, Supplementary Fig. 6, Supplementary Table 3, and Supplementary Data 3 and 4).

To further investigate population divergence among different species groups, we computed pairwise F ST values, demonstrating consistent relationships among the subpopulations with our proposed evolutionary scenario (Fig. 2c and Supplementary Fig. 7). Genome-wide inference of major allele origins in M. domestica from M. sieversii in Kazakhstan or M. sylvestris revealed that 46% of the M. domestica genome was probably derived from its progenitor M. sieversii in Kazakhstan and 21% from its secondary contributor M. sylvestris, while the origin of the remaining 33% was uncertain. The genomic introgression from M. sylvestris to scion cultivars is about 10% higher than that to rootstock cultivars, raising the possibility that M. sylvestris may have contributed important alleles for fruit quality and production traits to dessert apple cultivars (Fig. 2d and Supplementary Fig. 8).

Differential selection of domesticated apples

Considering the remarkable role of M. sylvestris in shaping modern domesticated apples, genomic regions dramatically affected by selection during apple domestication were identified in two contrasts: M. domestica vs. M. sieversii in Kazakhstan for initial domestication (Dom_SieK) and M. domestica vs. M. sylvestris for secondary introgression (Dom_Syl) (Fig. 3 and Supplementary Figs. 9 and 10). The selected regions of Dom_SieK and Dom_Syl had a mean size of 33.1 and 42.8 kb, covered a total length of 13.9 Mb (3.7 % of genome) and 17.2 Mb (4.6%), and harbored 840 and 1089 genes, respectively, among which 246 (29.3%) and 336 (30.9%) showed differential expression during apple fruit development (Supplementary Data 5–8). The identified selective sweeps are enriched with genes associated with fruit sugar content, firmness, color, hormone, and secondary metabolism in both Dom_SieK and Dom_Syl contrasts, while genes related to fruit acidity were only enriched in Dom_Syl (Fig. 3 and Supplementary Data 9). Notably, the enriched genes included those encoding six sugar transporters, several key enzymes in the glycolysis/gluconeogenesis pathway, two sucrose synthases and three cellulose synthases in Dom_SieK, two aluminum-activated malate transporters, a malate dehydrogenase, and a citrate synthase in Dom_Syl, and two sucrose synthases, one pyruvate decarboxylase, and one cellulose synthase in both Dom_SieK and Dom_Syl, highlighting the constant selection of sweet and firm fruits in the history of apple domestication (Fig. 3). Together, these candidate domestication-related genes are indicative of different selective forces for improving different agronomic traits from the two wild contributors during domestication.

Fig. 3 Genome-wide distribution of selective sweeps in M. domestica. a Selective sweeps in M. domestica compared with M. sieversii in Kazakhstan. b Selective sweeps in M. domestica compared with M. sylvestris. XP-CLR scores are plotted across the 17 chromosomes in the apple genome with key functional enzyme genes labeled above the dot peaks. Red vertical boxes illustrate selective sweeps, and blue boxes represent local GO enriched regions with associated traits labeled below. Selective regions shared by both comparisons are shaded with light blue bars, while interesting regions only identified in one of the two comparisons are shaded with light yellow bars. Traits include fruit acidity (A), color (C), firmness (F), hormone (H), soluble sugar (S), and secondary metabolites (M). Gene abbreviations: ALMT aluminum-activated malate transporter, ACO 1-aminocyclopropane-1-carboxylate oxidase, ACS 1-aminocyclopropane-1-carboxylate synthase, AE aldose 1-epimerase, AR aldose reductase, BG beta-galactosidase, CAS cycloartenol synthase, CTS citrate synthase, CLS cellulose synthase, FH flavanone 3-hydroxylase, GA3OX gibberellin 3-beta-dioxygenase, GG glucan endo-1,3-b-glucosidase, GMD GDP-mannose 4,6-dehydratase, IFR isoflavone reductase, IMS, 2-isopropylmalate synthase, MD malate dehydrogenase, PDC pyruvate decarboxylase, PDK pyruvate dehydrogenase kinase, PE pectin esterase, PFK 6-phospho-fructokinase, PG polygalacturonase, PL pectate lyase, SPD sorbitol 6-phosphate dehydrogenase, SPS sucrose phosphate synthase, SS sucrose synthase, ST sugar transporter Full size image

We further scanned SNPs that were highly divergent between M. domestica and different wild species groups using the top 1% F ST values (Supplementary Data 10). A large number of disease resistance (R) genes and genes involved in various different abiotic stresses were found to contain non-synonymous SNPs highly divergent between M. domestica and other wild species groups, suggesting their adaptations to different growth environments. Within M. domestica accessions, genes underlying dwarf quantitative trait loci (QTLs) (Dw1 and Dw2)17, 18, R genes and receptor kinase genes were identified to have non-synonymous SNPs highly divergent between rootstock and scion cultivars, which could facilitate the study of important apple rootstock traits, such as dwarfing, precocity, and disease resistance (Supplementary Data 11). The highly divergent SNPs and the associated genes discovered in this study provided ample information for broadening our understanding of apple speciation, differentiation, and evolution.

Increase of fruit size prior to and during apple domestication

One essential aspect of domestication process for most crop species is to increase fruit and/or seed size, which is also referred to as “domestication syndrome”19. Fruit sizes of M. sylvestris are significantly smaller than those of both M. domestica and M. sieversii, while domesticated apples have larger fruits than M. sieversii (Supplementary Fig. 1d, e). Two previously reported fruit weight QTLs20 (designated as fw1 on chromosome 15 and fw2 on chromosome 8) were found to be co-located with selective sweeps (Fig. 4a). QTLs fw1 and fw2 harbor 11 and 7 genes, respectively, in selective regions of M. domestica from M. sieversii, and 8 and 21 genes, respectively, in sweeps from M. sylvestris (Supplementary Data 12). Genes encoding regulators of cell division, such as fw2.2 in tomato21 and CNR1 in maize22, have been reported to control organ size, and recently a β-galactosidase gene was found to be involved in the regulation of fruit weight and size in strawberry, besides fruit softening23. We found that one cell division regulatory gene (MDP0000223854) and two β-galactosidase genes (MDP0000921848 and MDP0000179821) in fw1 were under human selection. Interestingly, the β-galactosidase gene, MDP0000179821, was in selected regions of both Dom_SieK and Dom_Syl and showed differential expression during apple fruit development with highest expression at the active cell division stage (Supplementary Data 12). Furthermore, several cell division regulatory genes (e.g., MDP0000555176, MDP0000802780, MDP0000846861, and MDP0000681201) and a gene (MDP0000241347) homologous to rice GS3 that controls grain size24 were found in several other selected regions, and they all showed highest expression at the active cell division stage of fruit development (Supplementary Data 6 and 8), suggesting their potential contribution to the increase of fruit size during apple domestication.

Fig. 4 Evolution of fruit size during speciation and domestication in apple. a Domestication sweeps underlying apple fruit size QTLs. Within the physical intervals (orange boxes) of the two fruit weight QTLs fw1 and fw2, distributions of XP-CLR scores are shown. Selective sweeps are marked with red bars and interesting genes are labeled above peaks. b MiRNA172g/miRNA172h and the two target genes that contain highly divergent SNPs (pointed by arrows) between M. domestica with large fruits and other wild species bearing very small fruits. c Schematic diagram of the two-step evolution of apple fruit size. BG beta-galactosidase, FBP fructose-1,6-bisphosphatase, FD ferredoxin, PPD pyruvate phosphate dikinase, P4 patellin-4, UGE UDP-glucose 4-epimerase Full size image

MiRNA172p was recently reported to regulate apple fruit size by targeting AP2 transcription factors25. Using their precursor sequences, the 16 miRNA172 genes identified in apple26 were clustered into four clades (Supplementary Fig. 11 ). We detected a highly differentiated SNP only between domesticated and other wild apples in the precursor sequences of miRNA172g and miRNA172h, two miRNAs sharing the same precursor sequences (Fig. 4b). No differentiated SNPs were detected in the 16 miRNA172 genes between M. domestica and M. sieversii. Furthermore, we found that five out of the eight target AP2 genes of miRNA172g and miRNA172h carried highly differentiated SNPs in the comparison between M. domestica and other wild species, two of which comprised non-synonymous SNPs (Fig. 4b and Supplementary Fig. 11 ). Therefore, besides the previously discovered miRNA172p 25, we identified two additional miRNA172 genes (miRNA172g and miRNA172h) that might have contributed to the increase of fruit size during Malus speciation prior to domestication.

Taken together, we propose a two-step evolution model for fruit size enlargement in apple to characterize its unique evolution process (Fig. 4c). Unlike modern maize and tomato whose domestication process started with ancestral species bearing very small seeds or fruits4, 5, apple domestication was initiated from M. sieversii whose fruits are larger than all other wild apples8. As fruit size is one of the most desirable traits for crop domestication and improvement, apple domestication started with a great advantage and much lower evolutionary pressure than other crops. The finding of fruit weight QTLs, and miRNA172s, and their target genes from comparisons between small-fruited wild apples and large-fruited cultivars helped explain why a weak selection in a highly heterozygous perennial crop can still yield favorable large fruits. The mild increase in fruit size during several thousand years of domestication after speciation, partially contributed by QTLs fw1 and fw2, suggests that apple fruit size has great potential to be increased in future breeding practices, considering that the modern tomato fruit is approximately 100 times larger than its direct wild progenitor.

Enhancement of fruit firmness during apple domestication

Besides large fruits, humans also have been selecting firm flesh texture, not only for crispy taste, but also for a longer shelf life, better post-harvest disease resistance, and reduced bruising during harvest and transportation. Fruit firmness is directly linked to enzyme-mediated cell wall modification27. To decipher the genetic mechanism underlying the selection for firm apples, we mined selective sweeps for genes that were potentially involved in regulating fruit firmness. We found a region on chromosome 16 was under intensive human selection in the Dom_SieK contrast, harboring genes encoding key cell wall modifying enzymes including three polygalacturonases (MDP0000512850, MDP0000939625, and MDP0000873268), and one glucan endo-1,3-beta-glucosidase (MDP0000295938) (Fig. 5a). Furthermore, several other selective regions were also found to contain genes related to cell wall modifications. For example, one selective sweep on chromosome 17 from the Dom_SieK comparison comprised three cellulose synthase genes (MDP0000190520, MDP0000289339, and MDP0000184309), and another sweep from the same chromosome harbored one pectate lyase (MDP0000301545), one glucan endo-1,3-beta-glucosidase (MDP0000206670), and one aldose 1-epimerase (MDP0000737131) (Fig. 5b). Similarly, another sweep on chromosome 12 from the Dom_Syl contrast harbored one endo-beta-1,4-mannase (MDP0000832632) and two pectinesterase genes (MDP0000278119 and MDP0000278118) (Fig. 5c). A number of these cell wall-related genes were differentially regulated during fruit development (Supplementary Data 6 and 8). Therefore, the evolution of these genes might have contributed to the firm fruit texture of domesticated apples.

Fig. 5 Domestication sweeps underlying apple fruit firmness. Distributions of XP-CLR scores and nucleotide diversity (π) in selective regions on chromosomes 6 (a) and 17 (b) in M. domestica from M. sieversii in Kazakhstan (Dom_SieK) and chromosome 12 (c) from M. sylvestris (Dom_syl), which harbor genes known to be associated with fruit firmness. Distributions of XP-CLR scores are shown in top panels with selective sweeps marked with red bars and interesting genes labeled above peaks. Distributions of π are shown in bottom panels with M. domestica in orange lines, and M. sieversii and M. sylvestris in green lines. AE aldose 1-epimerase, CLS cellulose synthase, GG glucan endo-1,3-b-glucosidase, MM endo-beta-1,4-mannase, PE pectinesterase, PG polygalacturonase, PL pectate lyase Full size image

Enrichment of fruit flavor during apple domestication

Fruit flavor, mainly a balance between sugars and acids, is another important trait under human selection. Although it has been reported that fruit acidity rather than sweetness is likely to undergo artificial selection28, we found that both fruit sugar content and acidity have been altered during apple domestication using historical phenotype data in the USDA-GRIN database (Supplementary Fig. 1). During the domestication from M. sieversii in Kazakhstan, a region on chromosome 12 has undergone intensive selection and co-localizes with a sorbitol QTL29. Interestingly, genes encoding four sorbitol transporters and two sugar transporters cluster within this sorbitol QTL under selection, and all four sorbitol transporter genes were differentially expressed during apple fruit development (Fig. 6a and Supplementary Data 13). Similarly, another genomic region on chromosome 13 enriched with genes encoding key enzymes of sugar metabolism is under intensive selection from M. sylvestris (Fig. 6b). Notably, the two sucrose synthase genes in this region are also in selective sweeps from M. sieversii in Kazakhstan.

Fig. 6 Domestication of fruit sweetness and acidity in apples. a Selective sweeps from M. sieversii in Kazakhstan that co-localize with a sorbitol QTL. b Domestication sweeps from M. sylvestris that contain key genes for sugar metabolism. Distributions of XP-CLR scores are shown. Selective sweeps are marked with red bars and interesting genes are labeled above peaks. c–f Domestication of the Ma1 gene that regulates apple fruit acidity. c Distributions of nucleotide diversity (π) of M. domestica (red), M. sieversii in Kazakhstan (blue), and M. sylvestris (green) in the Ma1 genome region. d Ma1 selective sweeps during domestication from M. sieversii in Kazakhstan (blue) and M. sylvestris (green). Sweep regions are marked with filled boxes. e Distribution of F ST between M. domestica and M. sieversii in Kazakhstan in the Ma1 genome region. f Distribution of F ST between M. domestica and M. sylvestris in the Ma1 genome region. bHLH transcription factor, CTS citrate synthase, HK hexokinase, MYB transcription factor, PDK pyruvate dehydrogenase kinase, PE pectin esterase, PG polygalacturonase, PK pyruvate kinase, SBT sorbitol transporter, SS sucrose synthase, ST sugar transporter Full size image