WGT in Nicotiana genomes is shared with other Solanaceae species but the Gypsy retrotransposons expansions are Nicotiana specific. Nicotiana genomes share the WGT with other Solanaceae species. (A, Left) Shared WGT event between Nicotiana and Solanum as revealed by superimposing species tree on the gene tree for the triplicated gene families in Nicotiana and Solanum. Red bar represents the percentage of triplication shared between Nicotiana and Solanum. Yellow bar represents shared duplication events or paralogous loss in one species. Brown bar represents Solanum-specific duplication events. (Right) Phylogenetic tree of 11 plant species and different colored stars indicate previously characterized whole-genome multiplication events. (B) Expansion of Gypsy transposable elements contributes substantially to genome size evolution in Nicotiana. (Left) Genomic content (1C content in gigabases) of repetitive versus nonrepetitive sequences in the 11 plant genomes. Black bar indicates nonrepetitive sequences, whereas other colors indicate repetitive sequences. (Right) Visualization of the expansion history of LTR retrotransposons in four Nicotiana genomes in comparison with tomato. The x axis (number of substitutions per site) refers to the divergence of a LTR from its closest paralog in the genome, with smaller numbers indicating more recent amplification events.

To investigate the evolutionary history of the different Nicotiana genomes, we inferred 23,340 homologous groups using protein sequences from 11 published genomes ( SI Appendix, Table S1 ). A phylogenomic analysis of the identified homologous groups demonstrated that Nicotiana species share a whole-genome triplication (WGT) event with solanaceous species, such as tomato, potato, and Petunia ( 11 ), but not with Mimulus ( Fig. 1 ), which is consistent with a large number of duplication events at the Solanaceae branch ( SI Appendix, Fig. S4 ), fourfold degenerate sites (4dTV) between duplicated paralogs ( SI Appendix, Fig. S5 ), and an analysis of the evolutionary origin of threonine deaminase (TD) in Solanaceae ( SI Appendix, Fig. S6 ). At least 3,499 gene pairs originating from this WGT were retained in both Nicotiana and Solanum. Among all retained gene pairs in N. attenuata that resulted from WGT but did not further duplicate in this species, more than 53.7% showed expression divergence (fold change greater than two) in at least one tissue, indicating that these WGT-derived duplicated genes may have evolved divergent functions through neofunctionalization or subfunctionalization.

We sequenced and assembled the genome of N. attenuata, using 30× Illumina short reads, 4.5× 454 reads, and 10× PacBio single-molecule long reads. We assembled 2.37 Gb of sequences representing 92% of the expected genome size. We further generated a 50× optical map and a high-density linkage map for superscaffolding ( SI Appendix, Figs. S1 and S2 ), which anchored 825.8 Mb to 12 linkage groups and resulted in a final assembly with a N50 contig equal to 90.4 kb and a scaffold size of 524.5 kb ( SI Appendix, Fig. S3 ). Likewise, using ∼50× Illumina short reads, we assembled the Nicotiana obtusifolia genome with a 59.5-kb and 134.1-kb N50 contig and scaffold N50 size, respectively. The combined annotation pipeline integrating both hint-guided AUGUSTUS and MAKER2 gene prediction pipelines predicted 33,449 gene models in the N. attenuata genome. More than 71% of these gene models are fully supported by RNA-sequencing (RNA-seq) reads and 12,617 and 18,176 of these genes are orthologous to Arabidopsis and tomato genes, respectively.

Innovations in metabolic and signaling network architecture are thought to result from the rapid rewiring of tissue-level gene expression patterns following duplication events ( 18 ). To examine this inference, we compared the genome-wide expression divergence between gene pairs that resulted from only one round of gene duplication and analyzed the effects of DTT-NIC1 insertions into 1-kb upstream regions of each member of the gene pairs. Insertions of the DTT-NIC1 family were associated with significant divergences in expression and tissue specificity between the gene pairs ( Fig. 2D ), consistent with the hypothesis that the expansion of this TE family was a critical determinant of genome-wide rewirings of gene regulation occurring postduplication in these Nicotiana species.

Expansion of transposable elements of the family DTT-NIC1 increased genome-wide gene expression divergence among duplicated genes in Nicotiana. (A) Copy number of the six most abundant Nicotiana MITE families of transposable elements in Nicotiana and Solanum. Each bar depicts the total number of copies in each species for the six main MITE TEs. MITE families are visualized by different colors: light blue, DTA (hAT); green, DTH (PIF/Harbinger); red, DTT (Tc1/Mariner). DTT-NIC1 from the Tc1/Mariner family is the most abundant of all MITE TEs. Nicotiana species: Na, N. attenuata; No, N. obtusifolia; Ns, N. sylvestris; Nt, N. tomentosiformis. Solanum species: Sl, Solanum lycopersicum; St, Solanum tuberosum. (B) DTT-NIC1 insertions are enriched in the upstream regions of coding sequences. The line indicates the percentage of genes, among all predicted protein coding genes, that contain DTT-NIC1 insertions within a given 500-bp sliding window. (C) Expansion of the DTT-NIC1 family in Nicotiana species. Neighbor joining (NJ) tree of the DTT-NIC1 family in N. attenuata, tomato, and potato. The scale indicates the branch length. The shaded clade highlights the pronounced expansion of DTT-NIC1 in N. attenuata. (D) Insertions of DTT-NIC1 within the 1-kb upstream region of duplicated genes increased tissue-level gene expression divergence (Wilcoxon rank sum test). (Left and Right) Violin plots of the divergences between gene pairs at expression and tissue specificity levels, respectively. Gene expression divergence was calculated based on the Euclidian distance between the expression profiles of gene pairs. Tissue specificity divergence was calculated based on changes of the τ-index between gene pairs. Because tissue specificity (τ-index) and expression divergence were calculated based on log 2 transformed transcripts per million (TPM) values, a small shift in the mean represents a relatively large effect. Red bars indicate duplicated pairs, of which one copy has at least one DTT-NIC1 insertion and the other does not. Blue bars indicate duplicated pairs, both of which lack DTT-NIC1 insertions. The width of the probability density represented by the violin plots along the bars corresponds to the number of duplicate gene pairs.

In addition to LTRs, miniature inverted repeat transposable elements (MITEs), which are derived from truncated autonomous DNA transposons, may also play evolutionary roles. Although the size of MITEs is generally small, typically less than 600 bp, MITEs are often located adjacent to genes and are often transcriptionally active. As such, they have been hypothesized to contribute to the evolution of gene regulation ( 15 , 16 ). In total, we identified 13 MITE families in the genome of N. attenuata, several of them having rapidly and specifically expanded in Nicotiana species ( Fig. 2 A and B ). Among these expanded MITE families, a Solanaceae-specific subgroup of the Tc1/Mariner defined by DTT-NIC1 is the most abundant. By analyzing insertion positions of this subgroup, we found that DTT-NIC1 copies, similar to other DNA transposons, are significantly enriched within a 1-kb region upstream of the genes in N. attenuata ( Fig. 2C ). Analyses on the herbivory-induced conserved transcriptomic responses in Nicotiana further showed that insertions of DTT-NIC1 are significantly enriched within the 1-kb upstream region of herbivory-induced early defense signaling genes in N. attenuata, and may have contributed to the recruitment of genes into the induced defense signaling network by introducing WRKY transcription factor binding sites ( 17 ).

Polyploidization is often associated with a burst of TE activity as a hypothesized consequence of “genomic shock” ( 12 , 13 ). TEs, especially long terminal repeats (LTRs) are highly abundant in Nicotiana and account for 81.0% and 64.8% of the N. attenuata and N. obtusifolia genomes, representing significantly higher proportions than in other sequenced Solanaceae genomes, such as tomato and potato ( Fig. 1 ). An analysis of the history of TE insertions revealed that all Nicotiana species experienced a recent wave of Gypsy retrotransposon expansion. However, this expansion of Gypsy copies was less pronounced in N. obtusifolia compared with other Nicotiana species analyzed, which accounts for the smaller genome size of N. obtusifolia. A recent study showed that Capsicum species also experienced a large expansion of their Gypsy repertoire ( 14 ), albeit earlier than in Nicotiana, indicating that after WGT, the different Solanaceae lineages independently experienced the processes of Gypsy proliferation.

Evolution of Nicotine Biosynthesis.

To further understand the role of gene duplication and TE insertions on the evolution of Nicotiana adaptive traits, we reconstructed the evolutionary history of the nicotine biosynthesis pathway, a key defensive innovation of the Nicotiana genus. Nicotine biosynthesis is restricted to the roots and involves the synthesis of a pyridine ring and a pyrrolidine ring, which are coupled most likely via the action of genes coding for an isoflavone reductase-like protein, called A622, and berberine bridge enzyme-like (BBL) enzymes (19, 20) (Fig. 3A). Phylogenomic analyses revealed that genes involved in the biosynthesis of the pyridine and pyrrolidine rings evolved from the duplication of two primary metabolic pathways that are ancient across all major plant lineages: the nicotinamide adenine dinucleotide (NAD) cofactor and polyamine metabolism pathways, respectively (Fig. 3A).

Fig. 3. Prolific nicotine production in the genus Nicotiana evolved from stepwise duplications of two primary metabolic pathways. (A) Metabolic organization (brightly colored and dashed line outlined branches) and evolution of Nicotiana-specific nicotine biosynthesis via pathway and single gene duplications. Light green and light blue branches on the side indicate the two ancient gene modules with housekeeping functions in plants for the biosynthesis of the NAD cofactor and metabolism of polyamines. Different gene duplication types are indicated by arrows, annotated as follows: NSD, Nicotiana-specific duplications; WGT, whole-genome triplication in Solanaceae; pWGT, gene duplication occurring before WGT. Phylogenetic analyses are presented in SI Appendix, Figs. S7–S14. Nicotiana QS (quinolinate synthase) and A622 did not duplicate but experienced an increase in root expression compared with their tomato homologs (SI Appendix, Figs. S13 and S15). (B) Phylogenomic analysis of grape, tomato, and N. attenuata gene sets highlighting the gradual assembly of the nicotine biosynthetic pathway. Duplication patterns of ODC, PMT, and MPO support the ancient origin of the ornithine-derived N-methyl-Δ1-pyrrolinium, which is notably used as a building block for the biosynthesis of tropane alkaloids in Solanum. AO, aspartate oxidase; BBL, berberine bridge enzyme-like; DAO, diamine oxidase; MPO, N-methylputrescine oxidase; ODC, ornithine decarboxylase; PMT, putrescine methyltransferase; SPDS, spermidine synthase.

However, the timing and mode of duplications of these two pathways differ and reflect the expansion and recruitment of gene sets required for the diversification of alkaloid metabolism in the Solanaceae (Fig. 3B). Duplications that gave rise to the branch extension of the polyamine pathway required for the biosynthesis of the signature alkaloids of Solanaceae and Convolvulaceae (e.g., tropane alkaloids in many genera and nicotine in Nicotiana) are shared among Nicotiana, Solanum, and Petunia with individual gene members recruited from the Solanaceae WGT or earlier duplication events. Genes encoding ornithine decarboxylase (ODC2) and N-putrescine methyltransferase (PMT) duplicated before the shared Solanaceae WGT from their ancestral copies in polyamine metabolism, ODC1 and spermidine synthase (SPDS), respectively (SI Appendix, Figs. S7 and S8). Whereas ODC2 likely retained its ancestral enzymatic function, PMT (derived from SPDS) acquired the capacity to methylate putrescine to form N-methylputrescine through neofunctionalization (21). The N-methylputrescine oxidase (MPO) from the polyamine metabolism pathway evolved from diamine oxidase (DAO) (22) through whole-genome multiplication. Both copies were retained in Nicotiana, Solanum, and Petunia (SI Appendix, Fig. S9), presumably to sustain the flux of N-methyl-Δ1-pyrrolinium required for alkaloid biosynthesis. Duplication patterns of ODC, PMT, and MPO therefore support the ancient origin of the ornithine-derived N-methyl-Δ1-pyrrolinium, which is used as a common building block for the biosynthesis of most alkaloid groups in the Solanaceae and Convolvulaceae (Fig. 3B).

In contrast to the relatively ancient origin of pyrrolidine ring biosynthesis, duplications of the NAD pathway genes, encoding aspartate oxidase (AO) and quinolinic acid phosphoribosyl transferase (QPT), responsible for pyridine ring biosynthesis are Nicotiana specific and likely occurred through local duplication events (SI Appendix, Figs. S10 and S11). BBLs are thought to be involved in the late oxidation step in nicotine biosynthesis that couples the pyridine and pyrrolidine rings, and therefore constitute a key innovation in the Nicotiana-specific synthesis of pyridine alkaloids. BBLs exhibit clear root expression specificity and likely evolved through neofunctionalization after gene duplications (SI Appendix, Fig. S12).

Tissue-level RNA-seq transcriptome analyses in N. attenuata confirmed that, whereas ancestral copies exhibit diverse expression patterns among different tissues, all of the duplicated gene copies recruited for nicotine biosynthesis are specifically expressed in roots (Fig. 4A) as well as transcriptionally up-regulated in response to herbivory via the jasmonate signaling pathway (23). Experimental work has shown that the transcription factors of the ethylene response factor (ERF189) subfamily IX and MYC2, play central roles in the up-regulation of nicotine genes (5). Analyzing the evolutionary history of MYC2 revealed that this gene duplicated at the base of the Solanaceae via genome-wide multiplication or segmental duplications, and two duplicated copies were retained in the genomes of Nicotiana and several Solanaceae species (SI Appendix, Fig. S16). Interestingly, ERF189 is located within the ERF IX cluster (6, 24) as a result of ancestral tandem duplications shared among Nicotiana, Solanum, Capsicum, and Petunia after WGT (SI Appendix, Fig. S17). After the split between Nicotiana and Solanum, ERF189 underwent independent tandem duplications in each linage, but root-specific expression for these duplicated genes is only found in diploid Nicotiana and not in tomato (SI Appendix, Fig. S17). Due to its essential role in regulating the expression of all nicotine genes (6), the acquisition of a root-specific expression by ERF189, likely in the ancestor of Nicotiana species, might have played a critical role for the coordinated root expression of nicotine biosynthesis genes.

Fig. 4. Coordinated transcriptional regulation of nicotine biosynthetic genes in roots was likely facilitated by transposon-derived transcription factor binding site insertions. (A) Acquisition of transcription factor binding motifs and root-specific expression evolution of nicotine biosynthesis genes (5, 23). Heatmaps depict the scaled expression of nicotine biosynthetic genes and their ancestral copies or closest paralogs across six distinct tissues. Light to dark violet coloration denotes low to high tissue-level expression (TPM, transcripts per million). A622, which likely neofunctionalized without being duplicated in solanaceous species, was not included in this analysis. Gene color categorizations are as used in Fig. 3: light colors correspond to NAD and polyamine primary metabolic pathways and brighter colors to subbranches of the nicotine pathway. The root-specific expression of nicotine biosynthetic genes and dramatic transcriptional up-regulation during insect herbivory is coordinated by the action of MYC2 and ERF IX transcription factors, which respectively target G- and GCC-type boxes in the promoters. Numbers of GCC and G-box motifs detected within 2-kb upstream region of nicotine biosynthetic genes and their ancestral copies are represented using specific color gradients (from light to dark green for increasing number of G boxes and light to dark orange for GCC boxes). (B) Average numbers of G and GCC boxes for the “ancestral” closest paralogs and “nicotine” copies of gene sets. P-values were calculated based on Wilcoxon rank sum test. (C) Many GCC and G-box motifs from nicotine biosynthesis genes are likely derived from TE insertions. Each row depicts the motif and TE annotation of the 2-kb upstream region of an individual nicotine biosynthesis gene. The predicted GCC and G-box motifs are shown in dark orange and dark green small boxes, respectively. The regions that were annotated as TEs from RepeatMasker are shown as rectangles with two different colors. Light blue, LTR; green, non-LTR. Motif sequences and their 150-bp flanking region showed significant homology (E-value less than 1e-5) to annotated TE sequences in N. attenuata are shown in dashed lines. In the case of PMT1.2 and MPO1 (highlighted by black arrows), almost all G and GCC boxes are apparently derived from TE insertions.

Regulation of nicotine genes by MYC2 and ERF189 relies in part on the presence of two transcription factor binding sites, the GCC and G-box elements in their promotor regions (5, 6, 25). Nicotine biosynthetic genes harbor significantly more GCC and G-box elements in their 2-kb upstream regions than do their ancestral copies (Fig. 4B), consistent with the hypothesis that the accumulation of GCC and G-box elements in promoter regions contributed to the evolution of the coordinated transcriptional regulation required for high-flux nicotine biosynthesis. Investigating the origin of GCC and G-box motifs in upstream regions of 10 nicotine biosynthesis genes showed that at least 43% and 29% of GCC and G-box motifs, respectively (Fig. 4C), are likely derived from TE insertions. Whereas it is unclear whether all of these predicted TE-derived GCC and G-box motifs are involved in regulating the expression of nicotine genes, some likely are. For example, in the case of PMT1, a previous experimental study revealed that the 650-bp upstream region, which specifically contained additional TE-derived GCC motifs, had a much larger capacity to drive the expression of a reporter gene in Nicotiana roots than did the 111-bp upstream region that lacked these motifs (26). Furthermore, all GCC and G-box motifs within the 2-kb 5′ region of MPO1 that is likely under control of ERF189 (22) are derived from TEs (Fig. 4C).

The mechanisms of genome organizational evolution, such as genome-wide multiplications and TE expansions, facilitated the evolution of several aspects of the antiherbivore defense arsenal, including a key metabolic innovation in Nicotiana species. These results are consistent with the hypothesis that TEs, which have often been considered as “junk” DNA, can be important orchestrators of the gene expression remodeling that is required for the evolution of adaptive traits. Because native Nicotiana species do not survive in nature without the ability to produce large quantities of nicotine to ward off attackers (27), it might be that this junk has facilitated innovations essential for their survival.