Tea is the world's oldest and most popular caffeine-containing beverage with immense economic, medicinal, and cultural importance. Here, we present the first high-quality nucleotide sequence of the repeat-rich (80.9%), 3.02-Gb genome of the cultivated tea tree Camellia sinensis. We show that an extraordinarily large genome size of tea tree is resulted from the slow, steady, and long-term amplification of a few LTR retrotransposon families. In addition to a recent whole-genome duplication event, lineage-specific expansions of genes associated with flavonoid metabolic biosynthesis were discovered, which enhance catechin production, terpene enzyme activation, and stress tolerance, important features for tea flavor and adaptation. We demonstrate an independent and rapid evolution of the tea caffeine synthesis pathway relative to cacao and coffee. A comparative study among 25 Camellia species revealed that higher expression levels of most flavonoid- and caffeine- but not theanine-related genes contribute to the increased production of catechins and caffeine and thus enhance tea-processing suitability and tea quality. These novel findings pave the way for further metabolomic and functional genomic refinement of characteristic biosynthesis pathways and will help develop a more diversified set of tea flavors that would eventually satisfy and attract more tea drinkers worldwide.

Here, we report a high-quality genome assembly of Yunkang 10 (2n = 2x = 30 chromosomes), a diploid elite cultivar of C. sinensis var. assamica widely grown in Southwestern China, based on sequence data from whole-genome shotgun sequencing. Together with comparative transcriptomic and phytochemical analyses for the representative Camellia species, we aim to obtain new insights into the molecular basis of the biosynthesis of the three characteristic secondary metabolites with an emphasis on the suitability of tea-processing and the formation of tea flavor.

As one of the most popular beverages worldwide, tea has well-established nutritional and medicinal properties derived from the three major characteristic secondary metabolites: catechins, theanine, and caffeine. These phytochemical compounds, especially catechins, are beneficial for human health (), the contents and component proportions of which in large part determine the flavor of tea. The genus Camellia, consisting of ∼119 species () with differential metabolite profiles, provides a uniquely powerful system for dissecting the variation and evolution of flavonoid, theanine, and caffeine biosynthesis pathways that define tea-processing suitability. Thousands of years of continental introduction and conventional selective breeding efforts have resulted in a large number of land race and elite cultivars that adapt to globally diverse habitats, thus ensuring different tea productivity and quality worldwide. The rich metabolite constituents within the tea tree may play an important role in adaptations to diverse ecological niches on Earth. Unraveling the genomic basis of these global adaptations remains an unsolved mystery. Although it is well recognized that the differential accumulation of the three major characteristic constituents in tea tree leaves largely determines the quality of tea, little genomic information is currently available regarding the complex transcriptional regulation of catechins, theanine, and caffeine metabolic pathways. Sequencing of the tea tree genome would facilitate to uncover the molecular mechanisms underlying secondary metabolic biosynthesis with the promise to improve breeding efficiency and thus develop better tea cultivars with even higher quality.

Socially and habitually consumed by more than 3 billion people across 160 countries, tea is the oldest (since 3000 BC) and most popular nonalcoholic caffeine-containing beverage in the world (). Besides its attractive aroma and pleasant taste, the tea beverage has numerous healthful and medicinal benefits for humans due to many of the characteristic secondary metabolites in tea leaves, such as polyphenols, caffeine, theanine, vitamins, polysaccharides, volatile oils, and minerals (). The tea plant Camellia sinensis is the source of commercially grown tea and a member of the genus Camellia in the tea family Theaceae, which also contains several other economically important species, including well-known camellias with their attractive flowers (e.g., C. japonica, C. reticulata, and C. sasanqua) and the traditional oil tree C. oleifera that produces high-quality edible seed oil (). The first credible record of tea as a medicinal drink occurred during the Shang dynasty of China and dates back to the third century AD (). The global expansion of tea is long and complex, spreading across multiple cultures over the span of thousands of years and expanding worldwide to more than 100 countries (). Today, tea is commercially cultivated on more than 3.80 million hectares of land on a continent-wide scale, and 5.56 million metric tons of tea worldwide were produced annually in 2014.

Caffeine (1,3,7-trimethylxanthine) is one of the most well-known purine alkaloids in plants (). It is synthesized by some eudicot plants, such as tea, coffee, cacao (Theobroma cacao), and maté (Ilex paraguariensis) from the holly family (). The caffeine of the tea tree is synthesized from xanthosine via a key pathway that has three methylation steps catalyzed by SAM-dependent N-methyltransferases (NMTs) ( Figure 4 A) (). With the aid of the completely sequenced tea tree genome, we identified a total of 13 NMT genes. We found that tea tree has fewer NMT genes than cacao (21) and coffee (23) ( Figure 4 B; Supplemental Tables 36 and 37 ). The gene expression profiles of NMTs at different tea tree developmental stages showed that most NMT genes (∼77%) were prone to be expressed in the leaves and flowers—two primary tissues for caffeine accumulation—while the tender shoots exhibited a slightly higher gene expression level in comparison with young leaves ( Supplemental Table 38 and Figure 4 D). With the three completely sequenced tea, coffee (), and cacao () genomes now in hand, we are able to comprehensively investigate the evolutionary landscape of caffeine biosynthesis by comparing genome-wide sampling of NMT genes from coffee, cacao, and tea tree and its wild relatives. Phylogenetic analyses show that the NMTs from tea tree and cacao apparently separate from coffee with strong bootstrap support ( Figure 4 C and Supplemental Figure 26 ), indicating an independent evolution of the caffeine synthetic pathway in tea tree and cacao relative to coffee. Notably, all NMT genes from tea tree and its relatives form a single gene clade with a strong bootstrap support, and are monophyletic with the five NMT genes from cacao ( Figure 4 C and Supplemental Figure 26 ). This suggests that the caffeine synthetic pathway of the tea tree and its related Camellia species may have originated from a common tea tree–cacao ancestor but diverged later and evolved independently. We demonstrate an independent, recent, and rapid evolution of caffeine biosynthesis in the tea tree, supporting multiple origins of caffeine biosynthetic NMT activity as proposed previously ().

Neighbor-joining (NJ) phylogenetic tree of NMT genes from tea tree (green solid dots), coffee (brown solid dots), and cacao (orange solid dots) NMTs. The 10 NMT genes cloned in seven wild relatives of tea tree, including C. irrawadiensis (green solid squares), C. ptilophylla (green solid lower triangles), C. granthamiana (green circles), C. lutchuensis (green squares), C. chrysantha (green diamonds), Camellia kissi (green lower triangles) and C. japonica (green upper triangles), are also shown and listed in Supplemental Table 36 . The phylogeny shows high bootstrap support for independent evolution of the 13 caffeine biosynthesis genes.

(A) The most essential and last three methylation steps for caffeine biosynthesis in plants. These methylation steps are catalyzed by a series of N-methyltransferases (NMTs), including xanthosine methyltransferase (7-NMT), theobromine synthase (7-methylxanthine methyltransferase; MXMT), and caffeine synthase (3, 7-dimethylxanthine methyltransferase; TCS). SAMS represents S-adenosylmethionine synthetase, while SAH indicates S-adenosylhomocysteine.

The 24 characteristic metabolite-related genes exhibit distinct expression patterns in the eight tissues of the tea tree ( Supplemental Figures 23–25 ). Our results showed that the majority of genes encoding enzymes involved in flavonoid biosynthesis pathways were highly expressed in tender shoots ( Supplemental Figure 23 ), indicating that flavonoid biosynthesis actively occurs early during shoot differentiation. Genes involved in the theanine metabolic pathway were expressed in all tissues, but were more highly expressed in the seedling, agreeing with previous findings () ( Supplemental Figure 24 ). We also observed that genes encoding enzymes responsible for caffeine biosynthesis were highly expressed in seeds except for TCS, which is much more highly expressed in tender shoots and flowers than in other tissues, suggesting that caffeine may also be synthesized in seeds besides leaves ( Supplemental Figure 25 ).

Nevertheless, the 24 characteristic metabolite-related genes exhibited considerably distinct expression patterns in mature leaves across the 24 examined Camellia species ( Figure 3 B and Supplemental Tables 33 34 , and 35 ). The detected genes responsible for catechin biosynthesis, in particular, and caffeine biosynthesis rather than theanine biosynthesis were differentially expressed between Camellia species from section Thea and non-Thea sections ( Figure 3 B and Supplemental Table 35 ). For example, four genes encoding enzymes involved in the last few steps of catechin biosynthesis, ANR (anthocyanidin reductase) (∼1.38-fold on average, P = 1.88E-01), F3′5′H (flavonoid 3′,5′-hydroxylase) (∼16.86-fold on average, P = 5.17E-02), CHI (chalcone isomerase) (∼3.79-fold on average, 2.19E-02), and FNS II (flavone synthase II) (∼6.64-fold on average, P = 1.27E-01), were more highly expressed in section Thea species where higher content of catechins was observed when compared with non-Thea sections. Similar expression patterns were also observed in three of the four examined genes encoding key enzymes involved in caffeine biosynthesis, including TCS (tea caffeine synthase) (∼12.17-fold on average, P = 2.81E-03), IMPDH (inosine-5′-monophosphate dehydrogenase) (∼3.30-fold on average, P = 4.71E-05), and AMPDA (AMP deaminase) (∼2.42-fold on average, P = 1.08E-04); they were significantly more highly expressed in section Thea species, which also contains significantly higher content of caffeine when compared with non-Thea sections ( Figure 3 B and Supplemental Table 35 ). Notably, gene expression levels of TCS encoding the enzyme that catalyzes the final step in caffeine biosynthesis largely differed among species from either section Thea or non-Thea sections (P < 0.001), corresponding to their variable amounts of caffeine ( Figure 3 B and Supplemental Table 35 ). Sequence variation of these 24 characteristic metabolite-related genes correlates well with phytochemical differentiation of the three major secondary metabolic pathways among these representative Camellia species ( Figure 3 C).

To gain novel insights into the molecular mechanisms underlying the phytochemical characteristics of major secondary metabolites in the tea tree and other Camellia species, we performed an integrated analysis based on comparative transcriptomic and phytochemical data for the same panel of Camellia species grown under the same conditions ( Supplemental Tables 30 and 33 ). On the basis of the annotation of genes encoding enzymes potentially involved in catalyzing these reactions of flavonoid, theanine, and caffeine pathways in our assembled tea tree genome, we first obtained homologous genes from the respective transcriptomes of the other 23 Camellia species ( Supplemental Table 32 ), including the 14 catechin biosynthesis-related genes (PAL, C4H, 4CL, CHS, CHI, F3′H, F3′5′H, FNS II, FLS, DFR, LCR, ANS, ANR, and F4′ST), six theanine biosynthesis-related genes (TS, GS, GDH, ADC, SAMDC, and Fe-GOGAT), and four caffeine biosynthesis-related genes (IMPDH, SAMS, AMPDA, and TCS), respectively. Our analysis revealed that species from section Thea as well as other relatives from non-Thea sections possessed all of these important genes encoding enzymes involved in the biosynthesis of catechins, theanine, and caffeine in cultivated tea tree. This suggests that the three characteristic metabolic pathways were already present in the common ancestor of Camellia and have remained well conserved for ∼6.3 million years ( Supplemental Section 7.5 ).

(B) Expression profiles in FPKM (fragments per kilobase per million reads mapped) of key functional genes (rows) for each species (columns) related to three metabolic pathways in the tea tree. Data are plotted as log 10 values. Right box plot indicates the expression correlations within section Thea (Thea; green), non-Thea sections (Non-Thea; orange), or between Thea and Non-Thea (gray).

Left panel represents the phylogenetic relationship of the 25 Camellia species constructed using whole-transcriptome sequencing data. Right panel shows the percent content of seven characteristic metabolites detected in the leaves of each Camellia species using HPLC (see Supplemental Information for abbreviation details).

Previous studies on the sequenced plant genomes have shown that polyploidy has been a prominent feature in the evolutionary history of angiosperms and that whole-genome duplication (WGD) events, in particular, have had major impacts on crop gene and genome evolution (). We identified 16 520 paralogous gene pairs that spanned 47.6% of the protein-coding genes in the tea tree genome ( Supplemental Table 28 ). On the basis of these duplicated gene pairs, we calculated an age distribution of synonymous substitution rates (Ks) that peaked around 0.36 and 1.16 ( Figure 2 D; Supplemental Figure 16 Supplemental Table 29 ), suggesting that two rounds of WGD events occurred in the tea tree genome. We compared the tea tree genome with two other eudicot genome sequences (kiwifruit and grape), respectively, based on the distribution of Ks values of paralogous gene pairs ( Figure 2 D). Our results confirm that the ancient WGD (Ad-γ), referenced as γ in the literature for eudicots, was shared among tea tree, grape (), and kiwifruit (). The recent WGD event (referenced as Ad-β) that occurred in tea tree was also observed in four other Camellia species (C. sinensis var. sinensis, C. taliensis, C. reticulata, and C. impressinervis) based on Ks values of paralogous genes derived from their high-quality transcriptome data ( Supplemental Figure 17 ). This WGD event, thus, occurred in the common ancestor of these investigated Camellia species. To determine whether Ad-β was a genus-specific event in the tea tree or shared with the WGD reported in kiwifruit ( Figure 2 D), we computed Ks values to date the speciation time based on orthologous gene pairs from syntenic blocks between tea tree and kiwifruit. However, our results still failed to clearly conclude whether this recent WGD event occurred before or after the tea tree-kiwifruit divergence ( Figure 2 D and Supplemental Figure 18 ) because the estimated dates for tea tree and kiwifruit WGD events are quite close to their speciation time. We further adopted the PUG (Phylogenetic Placement of Polyploidy Using Genomes) pipeline () and obtained a small proportion (∼16%) of gene trees derived from the Ad-β event of tea tree and kiwifruit, supporting that this Ad-β event occurred prior to the divergence between tea tree and kiwifruit ( Supplemental Figure 19 ). Further efforts are needed to sequence and compare eudicot genomes between tea tree and kiwifruit lineages to exactly determine whether they are the same or distinct WGD events.

Among the tea tree-specific and expanded gene families, we found that defense genes were among one of the most highly enriched functional categories including plant disease defense response, e.g., NB-ARC domain (PF00931; P < 0.001) and leucine-rich repeat (LRR) (PF13516, PF07725, PF12799, PF00560, PF13855; P < 0.001) ( Supplemental Tables 21– 25 , and 26 ). These findings suggest that strong natural selection for enhanced disease resistance in the tea tree potentiated global adaptations to the diverse habitats of Asia, Africa, Europe, North America, South America, and Oceania. To further assess this, we thoroughly explored the disease resistance genes, including the nucleotide-binding site with leucine-rich repeat (NBS-LRR) and pattern-recognition receptor (RLK-LRR) genes in the tea tree together with four other eudicots (kiwifruit, tomato, cacao, and A. thaliana). Results showed that tea tree harbored a total of 313 NBS-LRR encoding genes, which is larger than those in kiwifruit (104), A. thaliana (207), tomato (263), and cacao (297) ( Figure 2 C and Supplemental Table 27 ). NBS-LRR genes in plants are mainly responsible for recognizing specific pathogen effectors (); thus, the observation of a large expansion of this type of genes implies selection pressures in response to pathogenic challenge. We also characterized a total of 272 putative RLK-LRR genes that encode receptor-like kinases with an LRR domain (RLK-LRR) in the tea tree genome ( Supplemental Table 27 ). This number is slightly larger than that found in kiwifruit (254), tomato (231), potato (261), cacao (238), and A. thaliana (224), suggesting that pattern-triggered immunity, another type of ancient innate immunity in plants, is more conserved in the tea tree and may play an important role in pathogen defense.

In flowering plants, the expansion or contraction of gene families is an important driver of lineage splitting and phenotypic diversification (). We characterized gene families that underwent discernible changes and divergently evolved along different branches, with particular emphasis on those involved in tea tree traits and tea flavor ( Figure 2 B). Our results showed that, of the 13 476 gene families inferred to be present in the most recent common ancestor of the ten studied plant species, 1857 comprising 2048 genes exhibited significant expansions (P < 0.001) in the tea tree lineage ( Figure 2 B). Functional annotation of these genes demonstrates that they were mainly enriched in functional categories involved in flavonoid metabolic processes, including flavonoid metabolic process (GO: 0009812, P < 0.001) and flavonoid biosynthetic process (GO: 0009813, P < 0.001) ( Supplemental Tables 24 25 , and 26 ). Notably, gene families were significantly enriched in a number of functions related to the modification of flavonoid metabolic compounds, such as quercetin 3-O-glucosyltransferase activity (GO: 0080043, P < 0.001), UDP-glucosyltransferase activity (GO: 0035251, P < 0.001; PF00201, P < 0.001), UDP-glycosyltransferase activity (GO: 0008194, P < 0.001), and flavonoid glucuronidation (GO: 0052696, P < 0.001) ( Supplemental Tables 24 25 , and 26 ). The glucosyltransferase activities are well known to affect tea flavor and quality by controlling the content and formation of important secondary metabolites, for example, galloylated catechins and flavonol 3-O-glycosides, which largely determine the astringency of tea flavor ().

Defining gene families evolving rapidly among flowering plants has been useful in identifying the genomic bases underlying species adaptation and physiological changes of metabolite constituents during evolution (). We compared the predicted proteomes of the tea tree, kiwifruit, potato, tomato, coffee, A. thaliana, cacao, poplar, grape, and lotus, yielding a total of 26 024 orthologous gene families that comprised 246 457 genes ( Supplemental Table 20 Supplemental Figure 15 ). This revealed a core set of 113 439 genes belonging to 6730 clusters that were shared among all 10 plant species, representing ancestral gene families ( Figure 2 A). We found a total of 714 gene clusters containing 2170 genes unique to the tea tree, potentially related to environmental adaptation and phytochemical properties within the tea lineage ( Figure 2 A). Functional enrichment analyses of tea tree-specific genes by both gene ontology (GO) terms and PFAM domains together revealed functional categories related to biosynthetic processes associated with major tea characteristic secondary metabolites (e.g., catechins). The latter included flavonoid biosynthetic process (GO: 0009813, P < 0.001) and secondary metabolite catabolic process (GO: 0090487, P < 0.001) ( Supplemental Tables 21 and 22 ). PFAM analysis further revealed that gene functions involved in flavonoid biosynthesis are enriched in 2OG-Fe(II) oxygenase superfamily (PF03171, P < 0.001), which encodes enzymes associated with the production of anthocyanidin and flavonol (flavanone 3-hydroxylase, anthocyanidin synthase, and flavonol synthase) ( Supplemental Tables 21 and 23 ). Terpenoids constitute a large family of natural compounds and are major components of resins, essential oils, and aromas (). Remarkably, we found that the tea tree-specific gene families were also significantly enriched in functions related to terpene synthase activity (GO: 0010333, P < 0.001) that may be associated with the tea aroma, further evidenced by PFAM annotation with enriched functional domain of terpene synthase (PF01397; P < 0.001) ( Supplemental Tables 21 22 , and 23 ).

Expansion and contraction of gene families among the 10 plant species. Phylogenetic tree was constructed based on 597 high-quality 1:1 single-copy orthologous genes using sacred lotus (Nelumbo nucifera) as outgroup. Pie diagram on each branch of the tree represents the proportion of genes undergoing gain (red) or loss (green) events. Number at root (13 476) denotes the total number of gene families predicted in the most recent common ancestor (MRCA) (see Supplemental Information ). The numerical value beside each node shows the estimated divergent time of each node (myr).

(A) Venn diagram shows the shared and unique gene families among the tea tree and seven other plant species. Each number in parentheses represents the number of genes within corresponding families (without parentheses).

The neighbor-joining and unrooted phylogenetic trees were constructed on the basis of 678 Ty1/copia (A) and 1795 Ty3/gypsy (B) aligned sequences corresponding to the RT domains without premature termination codon. LTR family names and proportion of each are indicated.

We sequenced the tea tree genome (cultivar Yunkang 10) from Yunnan Province, China. We performed a whole-genome shotgun sequencing analysis with the Illumina next-generation sequencing platform (HiSeq 2000). This generated raw sequence data sets of ∼707.88 Gb, thus yielding approximately 159.43-fold high-quality sequence coverage ( Supplemental Table 1 ). Using two orthogonal methods, we estimated that the genome size of Yunkang 10 is between 2.9 and 3.1 Gb ( Supplemental Figures 1 and 2 Supplemental Table 2 ). The tea tree genome was assembled using Platanus (), followed by scaffolding preassembled contig sequences and paired-read next-generation sequencing data using SSPACE (). This finally yielded a ∼3.02-Gb genome assembly that spans ∼98% of the estimated genome size and contains 37 618 scaffolds (N50 = 449 kb) and 258 790 contigs (N50 = 20.0 kb) ( Table 1 and Supplemental Table 3 ). To validate the genome assembly quality, we first aligned all available DNA and expressed sequence tags of the tea tree from public databases and obtained mapping rates of 75.56% and 88.30%, respectively ( Supplemental Table 5 ); secondly, we mapped all high-quality reads (∼339.49 Gb) to the assembled genome sequences, which show good alignments with a mapping rate of 93.96% ( Supplemental Table 5 ); and thirdly, the transcripts we assembled also showed excellent alignments/sequence identities to the assembled genome: out of 198 175 transcripts, 76.23% were mapped (transcript coverage ≥90% and identity ≥90%; Supplemental Table 5 and Supplemental Section 1.6 ).

Discussion

We present a high-quality genome sequence for the cultivated tea tree. The tea tree offers advantages as an ideal system for functional genomics to understand the formation of a large number of secondary metabolites in many medicinal plants. This draft genome sequence thus provides the foundation for revealing the genetic basis of agronomically important traits and the characteristic physiological, medicinal, and nutritional properties of the tea tree. The availability of the first genome in the genus Camellia will facilitate in-depth fundamental comparative studies on tea tree biology, addressing a wealth of questions about the Camellia gene and genome evolution. This is particularly important for enhancing the breeding programs of the most productive oil-bearing crop C. oleifera and the horticulturally distinguished camellias comprising C. japonica, C. reticulata, and C. sasanqua.

Nystedt et al., 2013 Nystedt B.

Street N.R.

Wetterbom A.

Zuccolo A.

Lin Y.C.

Scofield D.G.

Vezzi F.

Delhomme N.

Giacomello S.

Alexeyenko A.

et al. The Norway spruce genome sequence and conifer genome evolution. Albert et al., 2013 Albert V.A.

Barbazuk W.B.

dePamphilis C.W.

Der J.P.

Leebens-Mack J.

Ma H.

Palmer J.D.

Rounsley S.

Sankoff D.

Schuster S.C.

et al. The Amborella genome and the evolution of flowering plants. Devos et al., 2002 Devos K.M.

Brown J.K.M.

Bennetzen J.L. Genome size reduction through illegitimate recombination counteracts genome expansion in Arabidopsis. Devos et al., 2002 Devos K.M.

Brown J.K.M.

Bennetzen J.L. Genome size reduction through illegitimate recombination counteracts genome expansion in Arabidopsis. Ma et al., 2004 Ma J.X.

Devos K.M.

Bennetzen J.L. Analyses of LTR-retrotransposon structures reveal recent and rapid genomic DNA loss in rice. Meyers et al., 2001 Meyers B.C.

Tingley S.V.

Morgante M. Abundance, distribution, and transcriptional activity of repetitive elements in the maize genome. Ming et al., 2015 Ming R.

VanBuren R.

Wai C.M.

Tang H.B.

Schatz M.C.

Bowers J.E.

Lyons E.

Wang M.L.

Chen J.

Biggers E.

et al. The pineapple genome and the evolution of CAM photosynthesis. Mirouze et al., 2009 Mirouze M.

Reinders J.

Bucher E.

Nishimura T.

Schneeberger K.

Ossowski S.

Cao J.

Weigel D.

Paszkowski J.

Mathieu O. Selective epigenetic control of retrotransposition in Arabidopsis. The genome sequence obtained reveals some of the unique biology of the tea tree. For instance, the tea tree possesses an extraordinarily large genome when compared with most sequenced plant species. We show that this results from the slow, steady, and long-term amplification of a few LTR retrotransposon families. It is possible that efficient DNA removal mechanisms (i.e., unequal homologous recombination and illegitimate recombination) are less prevalent in the tea tree genome, as previously described in other flowering plants (e.g., P. abies [], Amborella trichopoda []), when compared with A. thaliana () and rice (). We observe a positive relationship between expression levels of LTR retrotransposons and copy number of elements, in sharp contrast to an inverse correlation previously reported for maize () and pineapple (). DNA methylation differences may, instead, play a role in suppressing retrotransposition activation, leading to the increase of LTR retrotransposons in the tea tree genome (). The detection of a relatively recent WGD that occurred in the tea tree and other relatives indicates the contribution of genome duplication to the evolution of the genus Camellia. Such a WGD event together with massive segmental duplications have potentially facilitated the expansion of gene families relevant to the activation of major secondary metabolic biosynthesis (e.g., flavonoids and terpenoids) as well as disease resistance and abiotic stress tolerance. The accumulation of abundant metabolic constituents, such as flavonoids and terpenoids, has apparently played a significant role in supporting environmental adaptations of the tea tree. The rapid expansion of disease resistance-related and abiotic stress tolerance-related genes suggests a strong selection for enhanced disease resistance in the tea tree that may be attributable to the potential adaptation to globally diverse habitats, providing a large number of candidate stress tolerance and disease resistance loci for further study to generate even more environmentally resilient tea tree varieties. We thus hypothesize that these genomic features enabled the tea tree to widely adapt to varied climates and become a ubiquitous worldwide beverage plant.

We have identified lineage-specific genes that likely control the quality of tea, in particular genes encoding enzymes involved in the flavonoids, theanine, and caffeine biosynthesis pathways. The tea tree-expanded genes related to flavonoid metabolic processes and terpene synthase activity that regulate tea flavor and quality are significantly enriched GO terms. Our comparative analyses indicate that the three major characteristic metabolic pathways are extremely conserved among the tea tree and other Camellia plants. Large amounts of catechins and caffeine in the tea tree and other members from section Thea is a feature that has distinguished these species from those from non-Thea sections. Although catechins, theanine, and caffeine are typically thought to be key characteristic metabolic compounds to determine tea-processing suitability and tea quality, tea flavor is also affected by many other known (e.g., terpenoids) and unknown secondary metabolic compounds. Extremely low contents of catechins and caffeine in the Camellia species from non-Thea sections, likely with some other particular secondary metabolites, degrade tea quality.