How to make almonds palatable The domesticated almond tree has been feeding humans for millennia. Derivation from the wild, bitter, and toxic almond required loss of the cyanogenic diglucoside amygdalin. Sánchez-Pérez et al. sequenced the almond genome and analyzed the genomic region responsible for this shift. The key change turned out to be a point mutation in a transcription factor that regulates production of P450 monooxygenases in the biosynthetic pathway for the toxic compound. Science, this issue p. 1095

Abstract Wild almond species accumulate the bitter and toxic cyanogenic diglucoside amygdalin. Almond domestication was enabled by the selection of genotypes harboring sweet kernels. We report the completion of the almond reference genome. Map-based cloning using an F 1 population segregating for kernel taste led to the identification of a 46-kilobase gene cluster encoding five basic helix-loop-helix transcription factors, bHLH1 to bHLH5. Functional characterization demonstrated that bHLH2 controls transcription of the P450 monooxygenase–encoding genes PdCYP79D16 and PdCYP71AN24, which are involved in the amygdalin biosynthetic pathway. A nonsynonymous point mutation (Leu to Phe) in the dimerization domain of bHLH2 prevents transcription of the two cytochrome P450 genes, resulting in the sweet kernel trait.

Almond [Prunus dulcis Miller (D. A. Webb), syn. Prunus amygdalus L.] is the main tree nut species worldwide, with a cultivated area of about 1.9 million ha and an annual in-shell production exceeding 2.2 million metric tons (FAOSTAT 2017, www.fao.org/faostat/en/#data/QC). The spatiotemporal framework of almond domestication is still controversial, although archaeobotany and genetic studies suggest it occurred in the Fertile Crescent—the “Cradle of Civilization”—during the first half of the Holocene (1–4). Cultivation in the Mediterranean Basin is documented by the presence of almonds among the artifacts found in historical places such as the tomb of Tutankhamen (5) and the Franchthi Cave in Greece (6). Most recently, almond was introduced in suitable habitats in America (primarily California) and the Southern Hemisphere. In addition to almond, the botanic family of Rosaceae encompasses an extraordinary variety of species of great economic importance in temperate regions, including apple (Malus domestica), apricot (Prunus armeniaca), Japanese apricot (Prunus mume), peach (Prunus persica), strawberry (Fragaria vesca), and sweet cherry (Prunus avium) (7, 8). Among these, almond is unique in that the kernel (seed), rather than the flesh (mesocarp), represents the edible commercial product.

Kernels of wild almond species are bitter and highly toxic to humans and predators because the cyanogenic diglucoside amygdalin accumulates in the cotyledons (9–12). The key event enabling almond domestication was the selection of sweet kernel genotypes. Genetic studies showed that edible sweet almond kernels originate from a dominant mutation (13, 14) within the almond linkage group (LG) 5, at a locus referred to as Sweet kernel (Sk) (15, 16). The nature of the Sk gene has remained elusive.

Amygdalin is produced from the cyanogenic monoglucoside prunasin (9), which is biosynthesized in the seed coat (tegument). Recently, it was shown that the first two genes in the prunasin biosynthetic pathway, PdCYP79D16 and PdCYP71AN24, are poorly expressed in the tegument of sweet genotypes relative to bitter genotypes (12). The almond genome harbors an additional CYP79-encoding gene, named PdCYP79A68 (12), a putative ortholog of the P. mume gene PmCYP79A68 (17). PmCYP79A68 was shown to have weak activity toward tryptophan, and it did not catalyze conversion of phenylalanine into the corresponding oxime, as required for a function in prunasin and amygdalin synthesis (17).

Despite the economic importance of almond, genomic resources in almond are limited relative to those available for other Rosaceae species (18–22). Here, we report the assembly of the draft genome of almond. The sequence information was used to unravel the genetic differences between bitter and sweet kernel genotypes. The results provide insights into almond domestication history, as well as a sequence inventory for studies of other almond agronomic traits such as flowering time, drought tolerance, and resistance to disease.

The genome (2n = 2x = 16, haploid genome size = 246 Mb) of the sweet homozygous almond cultivar Lauranne was sequenced with a combination of Illumina (paired-end and 5-kb mate pairs) and PacBio technologies. The long PacBio reads were used to perform the genome assembly, whereas the Illumina reads were used to perform gap filling and scaffolding. The scaffolds obtained were organized in pseudomolecules using the P. persica reference genome, as the genomes across the Prunus genus are essentially collinear (23–25). The final assembly constitutes 4078 scaffolds, of which 2572 are organized in eight pseudomolecules, with a final N50 of 21.8 Mb (where N50 is the minimum contig length needed to cover 50% of the genome) and a L90 of 306 (where L90 is the smallest number of contigs whose length sum makes up 90% of genome size) (table S1). A high level of collinearity between the almond genomic assembly and the recently reported single-nucleotide polymorphism (SNP)/short sequence repeat (SSR) genetic linkage maps (25–27) was demonstrated (fig. S1).

By combining ab initio, RNA sequencing (RNA-seq), and transcript sequences from P. persica and the Viridiplantae Swissprot database, the P. dulcis genome was predicted to contain 27,817 genes (table S2), of which 16,747 had an annotation edit distance (AED) ≤ 0.3. A total of 1080 conserved plant genes were identified among the annotated proteins with the BUSCO tool (28) (table S3). Of these, 95% were complete and present in a single copy, confirming the quality of the assembly and annotations. Functional annotation of the genes was then performed by searching for orthologous sequences in P. persica, Arabidopsis thaliana, and P. mume. In this way, a Gene Ontology term was assigned to 12,777 genes, whereas a description was given to 24,035 genes. About 34.6% of the genome was found to be repetitive, including 11.1% of long-term repeat elements (table S4).

Using a set of Sk-linked cleaved amplified polymorphic sequence (CAPS) and SSR markers within the almond LG5 (29), together with two Sk-linked CAPS markers developed in this study (table S5 and fig. S2), a large mapping population of 475 F 1 individuals segregating at the Sk locus was genotyped. The resulting map displayed the two markers ppa003882m/BsaWI and ppa005388m/TaqI flanking the Sk locus, separated by 0.2 cM and 0.1 cM, respectively (Fig. 1A). Alignment of the two marker sequences against the assembled almond genome sequence allowed the definition of a physical interval of 118 kb, harboring 11 genes (Fig. 1B). A 46-kb cluster of five genes encoding putative basic helix-loop-helix (bHLH) transcription factors (bHLH1 to bHLH5) was positioned within this interval. We investigated these bHLH genes for their ability to control amygdalin accumulation because (i) sweet and bitter almond genotypes are associated with markedly different transcriptional levels of the amygdalin biosynthetic genes PdCYP79D16 and PdCYP71AN24 (12), as confirmed by our RNA-seq experiment (Fig. 2); (ii) bHLH transcription factors are known to regulate the biosynthesis of several bioactive natural products (30); and (iii) we identified several bHLH-binding DNA matrices in the PdCYP79D16 and PdCYP71AN24 promoter regions (data S1).

Fig. 1 Map-based cloning of the Sk gene. (A and B) Comparison of linkage group 5 (LG05) (A) and pseudomolecule 5 (Pd05) (B) in the region where the Sk locus is mapped. Approximate positions of the five bHLH genes present within that region are shown.

Fig. 2 The amygdalin biosynthetic pathway and transcript levels of the encoding genes during fruit development. Expression patterns (standardized FPKM values) of the genes of the amygdalin pathway are shown during tegument growth in the sweet genotype Lauranne and the bitter genotype S3067.

Reverse transcription quantitative polymerase chain reaction (RT-qPCR) time-course experiments on tegument tissues of the sweet cultivar Lauranne (Sk/Sk) and the bitter genotype S3067 (sk/sk) did not reveal differential expression for bHLH1, bHLH2, and bHLH4 (fig. S3 and data S2); expression of bHLH3 and bHLH5 was not detected. A scan of a 20-kb genomic region associated with the Sk locus revealed higher heterozygosity in the bitter genome than in the sweet genome (fig. S4).

The expressed bHLH genes (bHLH1, 2, and 4) were sequenced in Lauranne and S3067 (fig. S5). In bHLH1, this highlighted an indel polymorphism of 161 bp leading to a truncated protein (262 versus 401 amino acids) in the sweet genotype. In bHLH2, a C/T SNP and a 3-bp indel were observed, resulting in a nonsynonymous Leu346 → Phe (L346F) substitution and an Asn412 insertion in the sweet genotype. In bHLH4, no polymorphism was found.

Functional characterization of bHLH1 and bHLH2 was carried out using a factorial transient β-glucuronidase (GUS) expression assay in Nicotiana benthamiana. The alleles were used in combination with the two PdCYP79D16 promoter regions cloned from Lauranne and S3067 (henceforth referred to as p79sweet and p79bitter) and the three promoter regions cloned from PdCYP71AN24 (henceforth referred to as p71sweet1, p71sweet2, and p71bitter) (Fig. 3). No significant differences were observed in GUS activity when expressing the two bHLH1 alleles (Fig. 3A). In contrast, a significant induction of GUS activity was observed for the bHLH2 bitter allele in combination with all versions of p79 and p71 (Fig. 3B). This indicated that bHLH2 could be the Sk gene. DNA sequencing of bHLH2 in 56 almond genotypes demonstrated cosegregation between the Sk/sk alleles and the C/T polymorphism in all cases except one (table S6), represented by the cultivar Atocha. However, Atocha displayed a different SNP positioned nearby, resulting in a Leu330 → Arg (L330R) nonsynonymous change (fig. S6).

Fig. 3 Trans-activation analysis of putative downstream gene promoters by the bHLH transcription factors localized in the Sk locus and expressed in the tegument during kernel development. (A and B) The bar plots show the capacity of bHLH1 (A) and bHLH2 (B) variants of sweet (Sw) and bitter (Bit) genotypes to trans-activate the sweet and bitter promoters of PdCYP79D16 (p79; light/dark blue colors) and PdCYP71AN24 (p71; pink/lilac colors), respectively. The effects of mutations in bHLH2 bitter and sweet at position 346 as well of their C-terminal tails are also shown. Control experiments: p19 silencing suppressor [C(–)] and no transcription factor (no TF) as negative controls, and pCaMV35S:APL3:GUS as positive control [C(+)]. For each promoter, shown in different colors, data were compared using analyses of variance (ANOVA) least significant differences test with P < 0.05. Therefore, significant differences in the multiple combinations of promoters with or without TF are shown by different letters but with the same color.

The L346F and L330R nonsynonymous mutations observed in sweet genotypes were both predicted to occur in the dimerization interphase typical of bHLH transcription factors (fig. S5). To study the possible functional consequences of the L346F mutation, we carried out direct mutagenesis of the coding sequence of the two bHLH2 alleles, substituting the Leu346 residue with Phe and vice versa. In addition, the last 151 amino acids of the two bHLH2 alleles (henceforth called the “tail”) were also exchanged to investigate whether the extra Asn residue occurring at position 412 in the sweet genotype of bHLH2 influenced protein activity. Induction of GUS activity was observed only in association with the Leu346 residue. Indel polymorphisms at the protein tail did not affect GUS activity (Fig. 3B). bHLH2 thus controls transcription of the P450 monooxygenase–encoding genes PdCYP79D16 and PdCYP71AN24 in the amygdalin biosynthetic pathway.

To understand how a change from Leu to Phe at position 346 affects activity of the bHLH2 transcription factor, we generated a structural model of the protein sequence surrounding this amino acid (Fig. 4, A to C). When dimerized, the corresponding Leu residues of each monomer are placed facing each other, allowing hydrophobic interactions between the amino acids. Change of this Leu into a Phe might have two consequences. The first is that steric hindrances would prevent dimer formation. This would be augmented by the presence of two additional phenylalanines in the near vicinity in the model. Alternatively, the dimer would be formed, but the ability of the aromatic rings to form π-stacks (31) would result in a dimer with an aberrant, nonfunctional conformation. To discriminate between these two possibilities, we expressed the bHLH2 bitter and sweet transcription factor variants in Escherichia coli as a fusion to glutathione S-transferase (GST) or His, and performed pull-down assays (Fig. 4D) as well as polyacrylamide gel electrophoresis under native conditions (fig. S7). Both protein versions migrated as a band with a molecular mass corresponding to the dimer, indicating that the L346F polymorphism does not affect the ability of bHLH2 proteins to form dimers. To test the DNA-binding ability of the bHLH2 dimers, we cloned a triple repeat of the six-nucleotide motif (CATGTG) present in all p79 and p71 promoter variants (data S1) upstream of a β-galactosidase gene in a yeast one-hybrid–compatible plasmid. Only the bHLH2 variants carrying Leu346 were able to induce β-galactosidase activity (Fig. 4E), indicating that the L346F mutation generates a nonfunctional bHLH2 dimer.

Fig. 4 A Leu → Phe substitution is responsible for formation of a nonfunctional bHLH2 dimer. (A) Visualization of the relative position of two bHLH2 monomers (red and cyan) after dimer formation. (B and C) A single amino acid substitution (Leu to Phe) is present in the sweet variant, possibly interfering with formation of a functional dimer. (D) E. coli–expressed bHLH2 proteins were subjected to pull-down assays with a His-tag resin, transferred to polyvinylidene difluoride membranes, and detected with an antibody recognizing the His- and GST-tag engineered at the N-terminal end of the proteins. Precision Plus Protein Standard (Dual Color) was used as a protein standard, in which the pink-colored marker protein corresponds to 75 kDa. (E) The sweet and bitter variants of bHLH2 were tested in a yeast one-hybrid screening for their activation of a synthetic promoter including a triple repetition of a CATGTG motif, a typical binding sequence for the basic β-helix-loop-helix transcription factor family. As a negative control, a triple repetition of a random hexanucleotide sequence (AGCTCG) was used. The effects of the L346F mutation in bHLH2 bitter and sweet are also shown.

The history of almond domestication is marked by the selection of sweet kernel genotypes, although other traits such as thinner endocarp and increased seed size might also have been objects of early selection (32). The knowledge of the toxicity of bitter almond and peach kernels was already realized in ancient Egypt in the second millennium BCE, where traitorous priests in the capital cities of Memphis and Thebes were poisoned to death with pits of peaches—a practice referenced in hieroglyphics as “death by peach” (33, 34). Caius Plinius Secundus (Pliny the Elder) wrote in his 37-volume encyclopedia Naturalis Historia that the Romans were proud of knowing how to remove the bitterness and toxicity from bitter almond kernels (35). In Basil of Caesaria’s Hexameron from the 4th century CE, it is stated that Greek agriculturists in Cappadocia had discovered that piercing of the trunk of a highly bitter almond tree near the root, so as to introduce a fat plug of pine into the middle of the pith, resulted in production of delicious sweet almonds (36).

Today, almond breeding often introduces desirable traits from genotypes heterozygous or homozygous for the bitter sk allele, thus requiring the selection of sweet kernel individuals in segregating populations. With the identification of the polymorphisms causally related to the kernel taste, such a selection could be efficiently performed early at the seedling stage.

Besides contributing to the advancement of almond genomics and breeding, the knowledge obtained in this study offers a route toward neo-domestication of other plants (37), for controlling accumulation of not only other cyanogenic glucosides, such as linamarin and lotaustralin in cassava tubers (38), but also other specialized metabolites, such as anthocyanins in strawberry (39) saponins in the sweet/bitter quinoa lines (40) and formation of gossypol-containing glands in cotton (41). Finally, we envisage that DNA sequencing of the bHLH gene in almonds found at ancient archaeological sites may be used to delineate the roots of cultivation (42) or how ancient human populations moved across Asia, Europe (through the Mediterranean Basin), and the Americas (43).

Supplementary Materials science.sciencemag.org/content/364/6445/1095/suppl/DC1 Materials and Methods Figs. S1 to S7 Tables S1 to S9 Data S1 and S2 References (44–89)

http://www.sciencemag.org/about/science-licenses-journal-article-reuse This is an article distributed under the terms of the Science Journals Default License.

Acknowledgments: We thank T. Cremades Rosado and M. Dahl for technical help; the Biology of Stress and Plant Pathology Department and Biotechnology Fruit Trees Group at CEBAS-CSIC for use of their facilities; J. A. Egea for statistical analysis advice; and T. Sicheritz-Ponten, B. Petersen, and A. Dhingra for their initial contributions and efforts regarding the genome sequencing. L. M. De Luca, Johns Hopkins University, is acknowledged for providing the reference to Basil of Caesaria’s Hexameron. Funding: Supported by VILLUM Foundation grants to the VILLUM Center for Plant Plasticity (VKR023054) (B.L.M. and R.S.-P.) and VILLUM project 13234 (R.L.L.-M.); a University of Copenhagen Excellence Program for Interdisciplinary Research grant to the Center for Synthetic Biology (B.L.M.); European Research Council Advanced Grant ERC-2012-ADG_20120314 (B.L.M.); VILLUM Young Investigator Program grant VKR023124 (R.S.-P.); and the projects “Mejora Genética del Almendro” (MINECO-Spain), Séneca Foundation grants for “Molecular Biology of Cyanogenesis in Almond” (11937-PI-09) (R.S.-P.), and “Breeding stone fruit species assisted by molecular tools” (19879/GERM/15) (F.D., J.D.C., and R.S.-P.). Author contributions: R.S.P., J.D.C., and F.D. designed crosses, generated and evaluated plant material; R.S.-P., R.M., C.M., J.D.C., F.R., R.A.C., R.L.L.-M., and S.P. performed the research; R.S.-P., S.P., C.L., L.R., F.D., R.L.L.-M., and B.L.M. designed the research; and R.S.-P., S.P., R.A.C., R.L.L.-M., and B.L.M. wrote the manuscript. Competing interests: Authors declare no competing interests. Data and materials availability: The following sequences can be found at GenBank database with the following codes: pCYP71AN24sweet2, MK041085; pCYP71AN24sweet1, MK041086; pCYP71AN24bitter, MK041087; pCYP79D16sweet, MK041088; pCYP79D16bitter, MK041089; bHLH1sweetLauranne, MK041090; bHLH1bitterS3067, MK041091; bHLH2sweetLauranne, MK041092; bHLH2bitterS3067, MK041093; bHLH4, MK041094.The almond genome sequence is available at the DDBJ database: AP019297-AP019304. All data are available in the main text or the supplementary materials or in the databases previously named.