Characterization and unusual organization of TRIMs

Identification and abundance of plant TRIMs

To annotate TRIMs in plant, we first analyzed 48 plant genomes available as of 1 April 2013 (Additional file 1: Table S1) [7, 20–65] using LTR_FINDER [66]. A total of 29,779 potential TRIM sequences were found in the 48 genomes with an average of 620 predicted sequences per genome. The minimum number of annotated sequences predicted for a single genome was 16 in Thellungiella parvula [43], and the maximum number was 3,300 for Ricinus communis [35]. The 29,779 sequences were then manually inspected for structures using BLASTN and BLASTX. From this, 3,549 sequences were determined to be TRIMs and the other 26,230 sequences were discarded. The primary constituents of the discarded fraction were tandem repeats and incomplete elements: 59 % in maize and 95 % in soybean (Additional file 1: Figure S1). The conservation of TRIM elements across species has previously been reported [14, 15, 17]. Thus, TRIM elements identified by LTR_FINDER in each genome were grouped into TRIM subfamilies rather than families. The 3,549 sequences were grouped into 217 TRIM subfamilies that included Wukong and Br4, originally identified by sequence alignments of homologous regions [13, 16]. Among the 48 plant genomes, de novo annotation identified TRIMs in 40 genomes; no TRIMs were annotated in the other eight, including Arabidopsis thaliana, for which five TRIMs, Katydid-At1, At2, At3, At4, and Cassandra, had been previously annotated by sequence alignments [14, 16, 17]. This indicates that de novo annotation does not identify all TRIMs. Therefore, all 217 identified TRIM subfamilies were used to conduct homology searches and an additional 72 subfamilies were found, including three new subfamilies in A. thaliana. A total of 289 TRIM subfamilies were identified in 43 genomes, including all 30 eudicots and nine monocots. Notably, TRIMs were found in the lycophyte, Selaginella moellendorffii, and three algae genomes, Chlamydomonas reinhardtii, Volvox carteri and Chondrus crispus (Table 1). To our knowledge, this is the first time that TRIMs have been reported in lycophytes and non-vascular plants. However, TRIM elements were not found in Physcomitrella patens, and four other algae genomes, Chlorella variabilis, Ostreococcus lucimarinus, O. tauri, and Cyanidioschyzon merolae.

Table 1 Summary of terminal repeat retrotransposons in miniature in 43 sequenced plant genomes Full size table

The average size of the 289 subfamilies was 685 base pairs (bp), much smaller than typical plant LTR retroelements (4–10 kb on average) [67]. Among the 289 subfamilies, 225 (77.9 %) were smaller than 1,000 bp and 197 (68.1 %) LTRs were smaller than 250 bp (Additional file 1: Figure S2A, B).

The copy numbers of TRIMs were highly variable between genomes. The majority (65 %, 28/43) of the plant genomes harbored more than 2,000 complete or fragmented TRIMs, only six (14 %) had fewer than 1,000 TRIMs (Table 1). Most, 174 of the 289 subfamilies (60 %), had copy numbers less than 500, and about one-quarter (70/289) had copy numbers greater than 1,000 (Additional file 1: Figure S2C).

Conservation and comparison of TRIMs

To determine the phylogenetic distribution and group the TRIM elements, the 289 TRIM subfamilies were used to search GenBank and conduct all-by-all BLASTN searches. We found 159 subfamilies in more than two plant taxonomic families; 78 subfamilies in multiple genomes from a same plant family, termed “family-specific TRIMs”; and 52 subfamilies in only a single genome, termed “species-specific TRIMs.” Species-specific TRIMs may have homologs that were either lost, diverged in other genomes, or not represented in GenBank (Table 1).

The TRIMs from the 43 plants were then grouped into families based on sequence similarity. A total of 156 TRIM families were identified, 60 of which were shared between plant families, 44 were specific to a single plant family, and 52 were species-specific. Of these 156 families, 145 were identified for the first time. We also found new members for the previously reported TRIM families [14–17], such as complete Cassandra transposons in Cucumis sativa and other plants.

The TRIMs from three plant taxonomic families, the Legumes (Fabaceae), Cruciferae (Brassicaceae), and Grasses (Poaceae), are detailed in Fig. 1. These three families were chosen as each contains more than five sequenced genomes, represents both dicots and monocots, and has ~140–150 million years (My) of evolution [68]. They provide a resource to analyze the conservation and evolution of plant TRIMs.

Fig. 1 Comparison of terminal repeat retrotransposons in miniature (TRIMs) in three plant taxonomic families. Black squares and triangles represent complete and fragmented TRIMs, respectively, shared within and between plant genomes. Black stars indicate TRIMs present in a single genome. TRIMs grouped into a single family are linked by dashed lines. TRIMs in pink, blue, and green boxes are present only in legumes, Cruciferae, and grasses, respectively Full size image

Within the Cruciferae, Arabidopsis lyrata and Brassica rapa shared a common ancestor with the model plant A. thaliana about 13 and 43 million years ago (Mya), respectively [69]. Nine TRIM families were previously reported in this plant family, including At1–4 and Cassandra in A. thaliana [14, 16, 17] and Br1–4 in B. rapa [16]. We found an additional 13 new TRIM families. Among the 22 TRIM families, two, Cassandra and At4, have complete or fragmented homologs in legumes and grasses, 11 were shared between the Cruciferae and other dicots, and nine families were found only within the Cruciferae (Fig. 1).

We found 36 TRIM families in the five legume genomes, including Cassandra and At4. Among these, 15 were shared between legumes and other plant families. Two families, GmaRetroS4 (abbreviated as Gm4) and GmaRetroS11 (Gm11) from Glycine max, were absent in the other four legumes but homologs were found in other plants. Eight family-specific TRIMs—LjaRetroS12 and 15, CarRetroS1 and 2, MtrRetroS2, CcaRetroS8 and 9, and GmaRetroS13—were found in subsets of the five sequenced legumes that last shared a common ancestor about 50 Mya [70].

In addition to the three previously described TRIM families—SMART [15], Cassandra [17], and Wukong [13]—we identified 22 new families within the grasses. Family OsaRetroS10 (Os10) had complete elements in Oryza sativa and O. brachyantha and homologs were found in Solanum lycopersicum (AC243477:1845–1967, E value = 7 × e−8) and S. pimpinellifolium (AGFK01075962: 4312–4434, E value = 7e−11). Ten TRIM families identified in O. sativa and O. brachyantha have complete and/or fragmented copies in Zea mays and/or Sorghum bicolor that diverged from the Oryza genus ~50–80 Mya [71].

Tandemly arrayed TRIMs

A typical LTR retrotransposon contains 5′ and 3′ LTRs flanking an internal region that often encodes proteins required for retrotransposition. We refer to this structure as L 2 I 1 , where L 2 refers to two LTRs and I 1 to an Internal sequence. In addition to the typical TRIM elements (L 2 I 1 ), some TRIMs were tandemly arranged and contained more than three LTRs and two internal regions, hereafter referred to as tandemly arrayed (TA)-TRIMs. So far, this peculiar structure has only been reported for the Cassandra TRIM, whose LTRs contain sequences similar to cellular 5S rRNA, which is also tandemly arranged [17, 18]. No 5S rRNA sequences were found in any of the other TRIM families.

We found that TA-TRIMS are common in plant genomes, with 129 subfamilies having TA-TRIM structures in 35 of the 43 genomes (Additional file 1: Table S2). To gain more insight into TA-TRIMs, we focused on maize, where there were 93 tandem arrays from four TRIM subfamilies. These arrays varied in organization and contained varying numbers of LTRs and internal sequences, such as three LTRs and two internal regions (L 3 I 2 ), and five LTRs and four internal regions (L 5 I 4 ) (Fig. 2, Additional file 1: Table S3). Among all the TA-TRIMs identified in maize, L 3 I 2 was the most frequent, accounting for more than 67 % (63/93) of all TA-TRIMs. To validate TA-TRIMs in maize, we conducted polymerase chain reaction (PCR) analysis using primers that targeted regions flanking TA-TRIMs from the Zma-SMART subfamily (Fig. 2), and further confirmed these structures by DNA sequencing. This validated the structure and organization of the TA-TRIMs, confirming that they were not artifacts of errors in genome assembly.

Fig. 2 Tandemly arrayed terminal repeat retrotransposons in miniature (TA-TRIMs) of Zma-SMART in the maize genome. Boxes containing black triangles indicate the long terminal repeats (LTRs) of TRIMs and gray boxes denote the internal regions of TRIMs. The gray pentagons are target site duplications (TSDs) that flank TRIMs and arrows indicate the polymerase chain reaction primers used to validate the TRIM sequences. M indicates a 100 base pair DNA ladder; A indicates a typical Zma-SMARTTRIM with two LTRs and one internal region (AC186328:154584–154863; TSD:AACAT); B indicates a TA-TRIM with three LTRs and two internal regions (AC210283: 61391–61889; TSD: GGGTT); C indicates a TA-TRIM with two inverted TRIMs (AC220956: 117725–118283; TSD: CTTCA); and D indicates a TA-TRIM with five LTRs and four internal regions (AC185340: 80554–81415; TSD: ATAAT) Full size image

TRIM-mediated gene evolution

Enrichment of TRIMs in genic regions

TRIMs have been postulated to be involved in gene divergence and regulation [14, 15, 17]. However, these studies focused on only one or a few TRIM families and did not provide a genome-wide and cross-species view of the impact of TRIMs on gene evolution and function. Therefore, we examined the distribution of TRIMs with respect to genes in 14 of the plant genomes. Our data indicate that TRIMs are enriched in genic regions, 18.8–49.4 % were located in or near (1.5 kb upstream) genes (Additional file 1: Table S4). Interestingly, an average of 2.7 % of the TRIMs within a genome have been recruited as exons, based on an analysis of annotated genes, including coding DNA sequences and untranslated regions (UTRs). In the red harvester ant, ~45 % of the TRIMs were present within or near predicted genes [19]. These results indicate that TRIMs may exhibit preferential insertion/retention in or near genes, in both plants and animals.

We further analyzed Ty1-copia and Ty3-gypsy LTR retrotransposons and miniature inverted–repeat transposable elements (MITEs) in G. max and Z. mays and compared their distributions with the annotated genes. We found that 4.1 % of Ty3 and 6.3 % of Ty1 retrotransposons were located in genic regions in Z. mays, and 11.7 % of Ty3 and 16.5 % of Ty1 retrotransposons were located in genic regions in G. max (Additional file 1: Table S5). These percentages were significant lower than TRIMs (Pearson’s Chi-squared test, p-value < 2.2e−16). MITEs are small DNA transposons that have insertion preferences in or near genes [72, 73]. We detected 37.1 % of MITEs in Z. mays and 37.4 % in G. max in and near genes, but TRIMs were present in genic regions at significantly higher frequencies in G. max but lower frequencies in Z. mays (Pearson’s Chi-squared test, p-value < 2.2e−16).

Insertion/maintenance in larger genes

We compared gene structures of TRIM-related genes (TRGs), genes that contain TRIM sequences, and non-TRIM-related genes (NTRGs) in G. max and Z. mays. In both genomes, TRGs had more exons and were larger than NTRGs (Additional file 1: Figure S3, Table S6). For example, in G. max the average exon number of TRGs was 12.2 versus 5.9 for NTRGs. Differences in exon number, exon size, and intron size between TRGs and NTRGs were statistically significant for both species: p-values from two-sample t-tests after log transformation were less than 2.2 × 10−16.

Because larger genes have more space to harbor transposable elements (TEs), we compared the density of TRIMs between larger and smaller genes to determine if the observation of TRGs being large was just an artifact of there being more space for a TRIM to insert. All annotated genes in G. max and Z. mays were ranked from smallest to largest, and the top and bottom 20 % were defined as “small” and “large” genes. We found 21 TRIMs in small (9,273 covering 7621 kb) genes and 1,554 TRIMs in large (9,273 covering 84,971 kb) genes in G. max. In G. max, the TRIM density in large genes was 0.17 insertions/gene, ~73 times higher than in small genes; on a per kbp basis, large genes were 6.5 times more likely to have TRIM insertions (0.0183 for large versus 0.0028 for small). In Z. mays, large genes also had a significantly higher density of TRIMs at 0.17 insertions/gene, ~53 times more than small genes (~2 times more on a per kbp basis) (Additional file 1: Table S7).

Because TRIMs are small, we expected relatively little contribution to the expansion of genes. Thus, the large differences in exon number and gene size may reflect an accumulation bias of TRIMs into larger genes. To test this hypothesis, TRGs and NTRGs in the two genomes were used to find orthologous genes in their closest relatives: Cajanus cajan and Phaseolus vulgaris for G. max, which diverged ~20 and 15 Mya, respectively [70]; and S. bicolor and O. sativa for Z. mays, which diverged ~10 and 50–80 Mya, respectively [71]. Results from all four genomes indicated that homologs of TRGs also have higher exon numbers and are larger than orthologs of NTRGs. The exon number and sizes of TRGs and NTRGs were similar to their orthologous genes (Additional file 1: Table S8). However, the introns of both TRGs and NTRGs in Z. mays were larger than their orthologs from S. bicolor and O. sativa, likely due to the higher transposon density in Z. mays [7].

To gain more insight into the distribution of TRIMs, we analyzed 30,853 genes in G. max and 23,670 genes in Z. mays that have defined syntenic orthologs in P. vulgaris and S. bicolor, respectively [74, 75]. In addition, we compared the distributions of TRIMs with Ty1 and Ty3 LTR retrotransposons and MITEs. TRIMs were significantly more frequent in genic regions than other TEs in both G. max and Z. mays, but at a lower percentage than MITEs in Z. mays (Additional file 1: Table S9). These results are similar to those from all annotated genes (Additional file 1: Table S5) and further support the observation that TRIMs are enriched in genic regions. We further investigated the structure of genes containing TRIMs or other TEs and found that the syntenic genes in which TRIMs served as exons or introns were significantly larger and had more exons than the genes without TRIMs in both genomes (t-test, p-value < 2.23−180). In addition, genes containing TRIMs were significantly bigger than the genes with MITEs in both genomes (Additional file 1: Table S10). Significant length differences were detected between the syntenic genes containing TRIMs and other LTR retrotransposons in G. max, but not in Z. mays (Additional file 1: Table S10). Given that the average size of Ty1 and Ty3 retrotransposons located in syntenic genes in Z. mays was 930.8 and 1211.9 bp, four to five times larger than TRIMs (219.9 bp), we assume that Ty1 and Ty3 retrotransposons enlarged the related genes. Taken together, these results indicate that TRIMs either preferentially insert into or are retained in large genes.

Purifying selection of TRIM-related genes

To explore the selective pressures that may have acted on TRGs, we calculated the ratio of the number of non-synonymous substitutions per non-synonymous site (Ka) to the number of synonymous substitutions per synonymous site (Ks) of the genes from G. max and Z. mays by conducting genome-wide pairwise comparisons with their homologous genes in P. vulgaris and S. bicolor using gKaKs [76]. In G. max, the average Ka value of TRGs was similar to that of NTRGs, but the average Ks value of TRGs was significantly lower than that for NTRGs (p-value < 2.2 × 10−16; Wilcoxon rank-sum test). In Z. mays, the average values for both Ka and Ks of TRGs were significantly lower than for NTRGs and indicated lower evolutionary rates for TRGs, consistent with our observation that TRGs are more conserved than NTRGs. It is interesting that the average Ka/Ks value of TRGs was 0.19 in G. max and 0.25 in Z. mays, much lower than 1.0 and significantly lower than that of NTRGs (p-value = 5.3 × 10−07 for G. max, p-value < 2.2 × 10−16 for Z. mays; Wilcoxon rank-sum test) (Additional file 1: Table S11). These results indicate that TRGs have likely undergone strong purifying selection.

Gene acquisitions related to TRIMs

Transposon-based gene capture is an important mechanism for gene evolution [77, 78]. Only one TRIM-mediated gene acquisition event has been reported to date, in A. thaliana [14]. To assess the incidence of TRIM-based gene capture, the 289 TRIM subfamilies were used for BLASTN and BLASTX searches to detect significant alignments (E value <1 × 10−10) to expressed genes. From this, 30 TRIM elements from seven subfamilies contained putative gene fragments, including one in Medicago truncatula and six in G. max (Additional file 1: Table S12). The sizes of the TRIMs ranged from 1,172 to 1,449 bp, similar to PACK-MULEs in rice (~1.5 kb) [37], and their internal regions had more than 70 % sequence identity to the host genes. These TRIMs contained only transcribed exon fragments, no introns. Two TRIMs carried exons from more than two genes. For instance, the internal region of GmaRetroS15 contained 217-bp and 160-bp sequences highly identical to the 5′UTR of LOC10081263 and an exon of LOC100820519, respectively. It also carried a 346-bp fragment with 76 % sequence identity to the 5–9th exons, but no introns, of LOC100798768, annotated as casein kinase I isoform delta-like protein (Fig. 3). These data suggest that TRIM-mediated gene acquisition may differ from DNA transposons, such as PACK-MULEs, that contain both exons and introns of cellular genes [78, 79], and is more similar to an LTR retrotransposon, for example, Bs1 in maize, which captured exons only [80–82], and the non-LTR retrotransposon L1 in human [83].

Fig. 3 Gene acquisitions related to terminal repeat retrotransposon in miniature (TRIM) GmaRetroS15 in Glycine max. Black triangles and arrows denote TRIM long terminal repeats and target site duplications, respectively. Solid boxes and lines are exons and introns of three genes marked with different colors. The pentagons are the last exons of the genes and indicate transcription orientation. I, II, and III indicate the fragments from three host genes. The cDNA sequence for each gene model is shown in parenthesis Full size image

Among the 30 elements carrying gene fragments, all had two or more copies except GmaRetroS1 and GmaRetroS28 (Additional file 1: Table S12), all the elements contained both LTRs, and were flanked by 5-bp TSDs. One complete copy each was found for GmaRetroS1 and GmaRetroS28 in G. max, although other nearly complete copies were also found. This suggests that additional transposition events occurred after gene acquisition, resulting in increased copy numbers.

Epigenetic pathways of TRIM elements

Methylation and targeting of TRIMs by sRNAs

Plants have evolved multiple pathways to epigenetically regulate TEs, including DNA methylation, posttranslational histone modification, and sRNA-mediated gene silencing [84, 85]. We investigated methylation patterns and sRNA abundance of TRIMs in G. max and Z. mays. We found that TRIMs in both genomes were methylated in all three cytosine contexts (CG, CHG, and CHH, where H is A, C, or T) (Fig. 4a), and that overall methylation patterns of TRIMs were similar to those of Ty1 and Ty3 LTR retrotransposons in G. max. In contrast in Z. mays, no boundaries were found for TE bodies and flanking regions (Additional file 1: Figure S4), likely due to the extremely high TE content (85 %) in Z. mays [7] and the nested organization of retrotransposons, in which many LTR retroelements are inserted into other LTR retrotransposons [8]. However, the methylation patterns of TRIMs were distinct from MITEs in both G. max and Z. mays (Additional file 1: Figure S4). TRIM body methylation was similar between the two genomes but the flanking regions in Z. mays showed higher methylation levels than those of G. max. Because TRIMs were enriched in genic regions (Additional file 1: Table S4), we further investigated the methylation of TRIMs in genes, and adjacent (within 1 kb) to genes and other non-genic regions. TRIMs in genes were generally less methylated in non-CG contexts as compared to those in intergenic regions (Fig. 4a).

Fig. 4 Epigenetic analyses of terminal repeat retrotransposons in miniature (TRIMs) in Glycine max and Zea mays. a Methylation patterns of TRIMs based on insertion position. Red: CG methylation, blue: CHG methylation, green: CHH methylation. b Example of TRIMs that were highly methylated and targeted by 24-nucleotide small interfering RNA. c Example of TRIMs that were highly methylated at only the CG context and not targeted by 24-nucleotidet small interfering RNA. d Methylation patterns of TRIM-related genes (TRGs) and non-TRIM related genes (NTRGs). TSS transcription start site, TTS transcription termination site Full size image

Methylation marks on TEs in plants are maintained by DNA methyltransferases and the RNA-directed DNA methylation pathway guided by 24-nucleotide small interfering RNAs (24 nt siRNAs) [86, 87]. To calculate the abundance of sRNA targeting TRIMs, sRNA data from G. max [88] and Z. mays [89] were mapped to the respective genomes and most TRIMs were targeted by 24 nt and/or 21 nt sRNAs (e.g., Fig. 4b, Additional file 1: Table S7). However, we also found some TRIMs located in expressed genes that were not targeted by sRNAs (Fig. 4c). Moreover, sRNA abundance varied among the different TRIM families (Additional file 1: Table S7). TRIM families were classified into three types based on DNA methylation and sRNA profiles (Additional file 1: Figure S5, Table S13): Type I: abundant 24 nt siRNAs in TE body, methylation in TE body, and relatively lower methylation in the flanking regions as compared to the TE body, showing clear borders of TRIMs; Type II: low 24 nt siRNA abundance, and CG and CHG methylation in both TE and flanking regions without clear borders; and Type III: low 24 nt siRNA abundance and high methylation only in CG context without clear borders. Thus, five, eight, and four TRIM families in G. max were divided into Type I, II, and III, respectively. Among six TRIM families in Z. mays, three were grouped into Type I and three into type II; Type III was not found in Z. mays. Families with high CHH methylation (Type I) were more frequently targeted by 24 nt siRNAs—the correlation between CHH methylation and sRNAs was previously reported for both G. max and Z. mays [89, 90].

Higher CG body methylation in TRIM-related genes

We further compared methylation levels between TRGs and NTRGs. In both G. max and Z. mays, TRGs were more methylated than NTRGs (Fig. 4d). To gain better insight into gene methylation as related to TRIM insertions, genes were categorized into three groups: (1) CG body-methylated genes, (2) C-methylated genes (possible RNA-directed DNA methylation—target loci or heterochromatic marks), and (3) unmethylated genes (Additional file 1: Table S14). TRGs had a significantly higher proportion of C methylated genes (27.4 % in G. max and 64.3 % in Z. mays) as compared to NTRGs (11.0 % in G. max and 35.2 % in Z. mays; p-value < 2.2 × 10−16, two-sample test of proportion using “prop.test” function in R). This was expected given that TRIMs were methylated in all three contexts (Fig. 4a). Interestingly, TRGs also had a significantly higher proportion of CG body-methylated genes (48.5 % in G. max and 19.5 % in Z. may) compared to NTRGs (19.8 % in G. max and 9.1 % in Z. mays; p-value < 2.2 × 10−16, two-sample test of proportion).

The proportion of CG body-methylated and C-methylated genes within TRGs varied among TRIM families (Additional file 1: Table S15). TRIM families with a higher proportion of CG body-methylated genes also had higher proportions of TRIMs inserted into genic regions, with positive correlations in both G. max (R = 0.937) and Z. mays (R = 0.438). In addition, negative correlations (G. max, R = −0.898; Z. mays, R = −0.329) were found between the proportion of C-methylated genes and rates of TRIM insertion into genic regions.

Origin and activity of TRIMs

Putative autonomous retrotransposons of TRIMs

TRIMs are small elements with no coding capacity and are non-autonomous, thus mobilization depends on transposases encoded by other autonomous transposons. However, no autonomous transposon for any TRIM has been reported in plants or the red harvester ant. To identify potential autonomous elements, all 289 TRIM subfamilies were used as queries to search against the 48 plant genomes and GenBank to find related but longer elements. For most subfamilies, 278, no retrotransposase-encoding element was found, but for 11 subfamilies we identified larger, complete elements ranging in size from 3,367 to 8,504 bp, encoding proteins of 384–1,577 amino acids in length (Additional file 1: Table S16). The retroelements could be classified as either Ty1-copia or Ty3-gypsy LTR retrotransposons based on sequence similarity to other retrotransposons. The LTRs of the large retroelements exhibited 79–98 % sequence identity with the related TRIMs and the LTR sizes of the TRIMs and their larger retrotransposons were similar (Additional file 1: Table S16).

Sequence similarity between the large elements and the TRIMs was not restricted to LTR regions. We identified an 8,504-bp Ty1-copia retrotransposon, OsajLTRA10, in Nipponbare (Oryza sativa L. ssp. japonica) using the 408-bp TRIM OsajRetroS10 as a query. The LTRs of both elements were 115 bp and shared 97 % sequence identity. OsajRetroS10 also showed 98 % and 94 % sequence identity with OsajLTRA10 at positions 1–130 and 131–408, respectively, which covers all of OsajRetroS10 (Fig. 5a). From this, we deduced that OsajRetroS10 is a derivative of OsajRetroA10 via internal deletions, with a breakpoint near the 130th nucleotide of OsajLTRA10. There were three complete OsajLTRA10 elements in Nipponbare, including OsajLTRA10 on chromosome 1 and two other copies [OsajLTRA10-1 (9,948 bp, on chromosome 9) and OsajLTRA10-2 (5,124 bp, on chromosome 12)]. Sequence alignment of OsajLTRA10 elements and OsajRetroS10 TRIMs revealed that the complete elements contained a 25-bp sequence (CGATCCTA(C/T)AA(G/T)TGGTATCAGAGCC) immediately 5′ of the breakpoint site, and the three OsajLTRA10 elements contained another nearly identical 25-bp sequence immediately 3′ of the breakpoint site. We refer to this as the “duplicated internal sequence.” The 25-bp duplicated internal sequence were also found in OsaiLTRA10 in 93–11 (Oryza sativa L. ssp. indica), a close relative of Nipponbare.

Fig. 5 a OsajRetroS10 and a putative autonomous LTR retrotransposon. OsajRetroS10 is 408 bp and shares high sequence identity with 8,504-bp Ty1-copia retrotransposon OsajLTRA10 in both the LTR and internalregions. OsajLTRA10 contains a duplicated 25-bp sequence, indicated by black lines. Primers targeting the conserved domain of the reverse transcriptase (RT) are indicated by arrows. b RT-PCR analysis of OsajLTRA10 Full size image

Among the 11 large LTR retrotransposons, SlyLTRA4, PtrLTRA2, VviLTRA5, PbrLTRA6, CarLTRA1, CarLTRA2, and GmaLTRA2 are likely unable to mobilize TRIMs because their retrotransposon proteins are either short or truncated. The remaining four elements encode retrotransposases that contain all functional domains for retrotransposition: SitLTRA5 has a 1,409 amino acid sequence, OsajLTRA10 a 1,577 amino acid sequence, OsiLTRA10 a 1,431 amino acid sequence, and SmoLTRA4 a 1,218 amino acid sequence. Thus, these four LTR retrotransposons are putative autonomous elements that can mobilize their related TRIMs. Furthermore, multiple expressed sequence tags (ESTs) showing sequence similarity with these four retrotransposons were identified, confirming the transcriptional activity of these LTR retrotransposons. We performed reverse transcriptase (RT) PCR analysis to validate the expression of OsajLTRA10 using primers complementary to the RT domain (Fig. 5a). Significant amplification was detected using cDNA from leaf, sheath, and flower of Nipponbare and confirmed the transcriptional activity of the OsajLTRA10 transposon (Fig. 5b).

Recent transpositions of a TRIM family

To gain more insight into the activity of TRIMs, we compared TRIMs from the reference genomes for two rice subspecies, japonica and indica, that diverged ~0.2–0.4 Mya from either O. nivara or O. rufipogon [91], and identified 41 and 31 polymorphic TRIMs in Nipponbare and 93–11, respectively. All polymorphic elements were flanked by 5-bp TSDs and absent in the orthologous regions. This suggests that these are newly inserted TRIMs and that transposition of TRIMs may be similar to that of LTR retrotransposons, as both create 5-bp TSDs.

We next conducted PCR to validate the new insertions of OsaRetroS10, for which a putative autonomous retrotransposon was found in both Nipponbare and 93–11 (Fig. 5a, Additional file 1: Table S16). We used three pairs of primers targeted to the flanking regions of new insertion sites (Additional file 1: Figure S6A) to amplify DNA from seven rice varieties, including four japonica (Nipponbare, Kitaaki, Azucena, and Moroberkan), three indica (93–11, IR36, and IR64), and two AA wild relatives, O. nivara and O. rufipogon. All three primer pairs yielded expected PCR product sizes in both Nipponbare and 93–11 and the two wild rice species (Additional file 1: Figure S6B), indicating that these TRIMs were mobilized after the divergence of these two rice subspecies. Interestingly, smaller bands were found in Kitaaki with P1 primers and IR64 with P2 primers. Sequence analysis did not show a deletion in either Kitaaki or IR64, rather an extra complete element and 5-bp sequence were found in the insertion site of Nipponbare and 93–11, respectively. This indicates that OsaRetroS10 may still be active in rice.