Tea is the most widely consumed beverage in the world next to water.1 With an annual global production of 4.72 million tons (http://faostat.fao.org/), tea represents a $40 billion-a-year industry, with significant expected growth in non-Asian regions (http://www.worldteanews.com/, 2013). The tea plant, Camellia sinensis (L.) O. Kuntze, is a woody evergreen species in the family Theaceae, and in the subgenus section Thea.2,3 Its putative center of origin is in an area of South-East Asia that includes south and southwest China, Indo-China and northeastern India.2–5 Although tea is thought to have been domesticated in China, the exact region where tea came under cultivation is not clear and the ancestry of the cultigens has not been identified.3,6

Commercial tea products are classified into different categories based on processing techniques, i.e., manner of fermentation and oxidization. The common categories include green tea, black tea, oolong tea, white tea, yellow tea and dark tea. Within each category, a large number of varieties are used in tea production, often with greatly differing quality. It is estimated that in different regions of China, several thousand tea varieties are being cultivated. In addition, growing conditions, cultivation practices and harvesting time also significantly influence the quality and post-harvest attributes of tea.7

In spite of the significant effects of genotype on tea qualities, efficient methods for varietal authentication in the tea value chain have not yet been developed. Numerous instrumental methods to authenticate tea varieties have been investigated, among which near-infrared spectroscopy (1988) has been studied the most. This rapid and non-invasive method was employed by numerous investigators for authentication studies of tea.8–12 However, while near-infrared spectroscopy can effectively evaluate many quality attributes, accurate varietal identification remains an unsolved problem when large numbers of genotypes need to be examined. In addition to near-infrared spectroscopy, deoxyribonucleic acid (DNA)-based methods have been applied to identify plant species from a large array of commercial tea products.13 Microsatellite markers have been used in tea variety identification.14–22 In addition, sequence-tagged sites and cleaved amplified polymorphic sequences were applied to tea varietal identification.23,24 However, to date, the application of DNA fingerprinting has been used only for the differentiation of varieties, which precludes verification of large numbers of varieties through exact genotype matching. Moreover, resolving genotyping results from different labs, even with the use of microsatellite markers, has not been straightforward. It is difficult to standardize data generated from different genotyping platforms, and comparison of data is further complicated, because the same allele may be binned differently. Therefore, the use of simple sequence repeat (SSR)-based fingerprints for tea authentication can lead to false conclusions.

Recent progress in technology for plant genomics has led to the escalation in use of single nucleotide polymorphism (SNP) markers in DNA fingerprinting.25 The most abundant class of polymorphisms in plant genomes,26,27 SNPs have many advantages that are leading to their use as marker of choice. Unlike SSR markers, DNA separation by size is not required to analyze SNPs, and an assay array format or microchips can be used to accurately determine their identities. Because SNPs are biallelic and codominant markers, the error rate in allele calling is much lower than with SSRs and quick, low-cost, multiplex genotyping techniques can be employed. These advantages have resulted in SNPs increasingly becoming the markers of choice for accurate genotype identification and in crop improvement. Using a nanofluidic system to analyze SNP markers, Fang et al.28 generated SNP fingerprint patterns for small quantities of DNA extracted from the seed coat of single cacao beans. Based on the SNP profiles, an assumed adulterant variety was unambiguously distinguished from the authentic beans by multilocus matching.

Camellia sinensis has a genome size of 4.0 Gb.29 Full genomic sequences of Camellia sinensis have not been developed; however, a substantial amount of transcriptome data and various expressed sequence tags (ESTs) have been developed from different tissues, including young roots, flower buds, immature seeds and roots.19,20,30–32 The publicly accessible EST databases offer a low-cost source for an effective first step in SNP discovery. The objectives of the present study were to develop SNP markers through the data mining of EST databases of tea plants and assess their potential application for tea varietal identification. The SNP resources reported herein represent the first study of EST-derived SNP validation in tea and demonstrate the utility of EST databases as an alternative approach for de novo SNP identification in species whose genome sequences are not yet available. These SNP markers, as well as the genotyping method, would be particularly useful for varietal authentication, germplasm management and tea breeding programs.