TE insertions accumulate in epiRILs

We previously produced a large population of A. thaliana epiRILs from an initial cross between two isogenic individuals, one of which carried a mutant allele of the epigenetic silencer gene DDM1, which is required for maintaining DNA methylation and silencing of TEs14. A single F1 individual was backcrossed to the wild-type parental line and homozygous F2 DDM1/DDM1 progeny were propagated by repeated selfing and single seed descent through another six generations13 (Fig. 1a). Genome-wide analysis of DNA methylation revealed that approximately one third of the differentially methylated regions (DMRs) between the ddm1 and wild-type parents are transmitted as such in the epiRILs13,15, which have each inherited on average 25% of their genome from the ddm1 parent (Fig. 1a) and have therefore mosaic epigenomes. Thus, the epiRIL population constitutes an ideal system to study TE mobilization in an essentially wild-type context and to compare insertion patterns in chromosomal intervals derived from the ddm1 and wild-type parents. Moreover, in this setting, heritable TE insertions should accumulate neutrally through random segregation, i.e., in the near absence of natural selection, which makes the epiRILs operationally similar to MA lines10,16,17,18, except for the initial TE reactivation induced by ddm1.

Fig. 1 TE insertions accumulate in the epiRILs. a Crossing scheme used to generate the epiRIL population. b Sequencing alignment tracks for epiRIL 394 and Col-0. Concordant and discordant mate-pair reads, as well as the sense or antisense orientation for the latter, are indicated in gray, brown and cyan, respectively c Number and identity of insertions accumulated in wild-type, ddm1 and the epiRILs. Number of hemizygous insertions are indicated in brackets. Pie charts show the proportion of insertions matching different full-length reference TE sequences. d (epi)QTL mapping of the number of insertions produced by ATCOPIA93, VANDAL21, and ATENSPM3. The full-length reference TE sequence located within the single (epi)QTL interval in each case is indicated. Box-plots indicate the variation in the number of insertions in the epiRILs in relation to the parental origin of the relevant (epi)QTL interval. For each boxplot, the lower and upper bounds of the box indicate the first and third quartiles, respectively, and the center line indicates the median. Source data of Figs. 1b and 1c are provided as a Source Data file Full size image

We obtained genome sequencing data for 107 epiRILs at generation F8 together with close relatives of the two founder plants, using Illumina mate-pair libraries, and the sequence reads were analyzed to identify non-reference (i.e., de novo) TE insertions. The large physical distance (~5 kb) between mate-pair reads enabled us to determine the complete sequence of the insertions, and thus of the donor TEs (Fig. 1b and Supplementary Fig. 1a). No de novo TE insertions were detected in the two wild-type siblings sequenced, nor in five A. thaliana MA lines that were derived from wild-type plants10, consistent with the low frequency of spontaneous TE mobilization12. In contrast, two non-reference TE insertions were detected in the sequenced ddm1 individual and 95% of the 107 epiRILs harbored at least one de novo insertion (Supplementary Fig. 1b). The number of new insertions varied greatly between epiRILs, with a maximal value of 97 new insertions in one line. Almost all new insertions (98.7%) were private (i.e., present in a single epiRIL), indicating that they occurred during the propagation of the epiRILs rather than in the ddm1 parent line (Fig. 1a). In addition, 1107 of the 1670 private insertions detected in total were found in the hemizygous state (Fig. 1c), indicating that transposition is likely still ongoing in the epiRILs at generation F8.

The LTR-retrotransposon family ATCOPIA93 and the two DNA transposon families ATENSPM3 and VANDAL21, which are among the most active in nature12, contributed over 95% of all private insertions (64.4%, 22.5%, and 11.2%, respectively), and another eight TE families contributed the rest (Fig. 1c). Moreover, 99% of the ATCOPIA93 and VANDAL21 insertions, as well as 58% of the ATENSPM3 insertions were identical to or best-matched just one of the several cognate full-length sequences present in the reference genome (Fig. 1c). In the case of ATCOPIA93, this mobile copy is Évadé (EVD) consistent with previous findings19,20. Conversely, most of the remaining ATENSPM3 insertions derived from two composite elements that do not encode a recognizable transposase (Supplementary Fig. 1c) and which therefore were presumably mobilized in trans by the single mobile full-length ATENSPM3.

To test whether most private insertions were generated from a single mobile reference sequence, we performed (epi)QTL mapping based on the hundreds of parental DMRs segregating in the epiRILs and which can therefore serve as bona-fide genetic markers21,22. For each of the three TE families, a single (epi)QTL interval was associated with the variation in the number of insertions between epiRILs (Fig. 1d) and this interval contained the single full-length reference TE sequence identified above. Incidentally, the lack of additional (epi)QTL intervals indicated that no other regions affected by ddm1 contributed appreciably to TE mobilization, thus revealing a simple genetic architecture for TE mobilization in each case.

Triggering of TE mobilization leads to exponential spread

To investigate the dynamics of TE mobilization, we carried out mathematical simulations based on a transposition-genetic drift scenario, assuming either that only the reference copy is mobile (master gene model) or that the reference and daughter copies are equally mobile (transposon model)23. In addition, our simulations incorporated three key parameters (Supplementary Fig. 2a), namely the rates of transposition and excision, as well as copy number-dependent inhibition of transposition. For all three TE families, the best fit to the observed data at generation F8 was obtained under a transposon model (Supplementary Fig. 2b). In this situation, insertion rates ranged between 0.15–0.81 per donor per generation, depending on the TE family (Fig. 2a and Supplementary Fig. 2c). These values are close to the rate of spontaneous point and small indel mutations per genome per generation in classical MA lines10. Thus, our results indicate that transposition is followed by an exponential spread of novel TE copies throughout the genome at rates that rapidly exceed those of small-size mutations (Fig. 2b).

Fig. 2 Transposition follows a chain reaction in the epiRILs. a Dynamics of insertion accumulation for the three TE families. Observed data obtained at F8 and F9 for ten epiRILs is shown for epiRILs harboring new private insertions. b Average rate across generations of TE-induced (black line) and small size mutations (red line) obtained using the epiRILs and MA lines10, respectively (bottom panels). Gray area represents 95% C.I. Transposition and excision rates per copy per generation, as well as TE copy number (CN) required for triggering concerted epigenetic silencing are indicated Full size image

To obtain direct evidence that new TE insertions as well as the initial donor TE could transpose in the epiRILs, we took advantage of the fact that TEs can acquire mutations during the transposition process24 to look for mutations that are shared by two or more private TE copies within a single epiRIL, as this pattern of sharing likely reflects consecutive mobilization events. While all VANDAL21 private insertions were identical to the mobile reference element and were therefore not informative in this respect, approximately 5% of those produced by ATENSMP3 and ATCOPIA93 contained large internal deletions and small size mutations, respectively. Moreover, nine of these mutations were shared by two or more private insertions per epiRIL and sequencing of ten epiRILs advanced to generation F16 revealed in several cases an increase in the number of ATENSPM3 private insertions sharing the same mutation (Supplementary Fig. 3a). These observations thus confirm that new TE copies are also mobile.

Although mobilization of DNA transposons involves excision, double-stranded gap repair can be used to replace the excised copy with that present on the homolog or the sister chromatid in case of post-replicative mobilization25. Such repair mechanism would thus lead to an apparent lack of excision, as well as a net increase in copy number and indeed our modeling predicted a low rate of excision (0.05 excision per transposition event) for VANDAL21. In agreement with this prediction, and consistent with observations in maize for the related Mu TE family26, the mobile VANDAL21 reference copy was still present at its original position in most of the epiRILs. In contrast, our modeling predicted a high rate of excision (0.92 excision per transposition event) for ATENSPM3 and we found that the mobile ATENSPM3 reference copy was systematically lost in the epiRILs that inherited the relevant genomic region from the ddm1 parent (Supplementary Fig. 2d). Moreover, numerous small indels compatible with ATENSPM3 excision footprints were detected in the epiRILs, revealing that once mobilized, ATENSMP3 rapidly becomes a major source of small indels in the A. thaliana genome (Supplementary Fig. 2e, f).

Our mathematical simulations indicated also that by F12, transposition should stop for ATCOPIA93 in most epiRILs. Specifically, the number of new ATCOPIA93 copies was expected to plateau at around 50 copies per diploid genome after 6–14 generations (Fig. 2a), which would be consistent with previous experimental data showing that for this TE, siRNA-mediated epigenetic silencing is initiated beyond a threshold of about 40 new copies20. To test this aspect of the modeling, we analyzed the genome sequence obtained for the ten epiRILs advanced to generations F16. ATCOPIA93 insertions continued to accumulate between F8 and F16 in the five epiRILs that showed ATCOPIA93 mobilization at F8 but not in the others. All five epiRILs with mobile ATCOPIA93 harbored around 50 copies at F16 and in four of these epiRILs, all copies were methylated and silenced (Supplementary Fig. 3b). Furthermore, finer grain analysis for one epiRIL (epiRIL394) indicates that all ATCOPIA93 copies underwent within one generation an ‘all-or-nothing’ gain of DNA methylation, thus resulting in their concomitant silencing (Supplementary Fig. 3c), as was reported before for two other epiRILs20. Taken together, these findings reveal that once active, TEs can rapidly become the major source of heritable mutations in the genome, until epigenetic silencing sets in to prevent further transposition, which in the case of ATCOPIA93 happens at once in a copy-dependent manner after a few generations.

Active TEs preferentially target genes

Overall, private insertions for VANDAL21, ATENSPM3, and ATCOPIA93 were distributed evenly across the chromosomes (Supplementary Fig. 4a). Nonetheless, because the epiRILs have mosaic epigenomes22, potential differences in insertion frequency could exist between chromosomal intervals of different parental origin. Indeed, for all three TE families, the percentage of insertions in pericentromeric regions was significantly higher when these were derived from the ddm1 parent (Supplementary Fig. 4a). As pericentromeres lose some of their heterochromatic features in an heritable manner in ddm114,22,27, we conclude that euchromatin is the preferred substrate for the integration of VANDAL21, ATENSPM3, and ATCOPIA93.

With few exceptions, DNA methylation and transcription do not differ detectably between the wild-type derived intervals of the epiRILs and the corresponding intervals of wild-type plants22,28. We therefore only considered the wild-type derived intervals of the epiRILs in our subsequent analyses, as they enabled us to obtain a genome-wide view of TE integration patterns that is unaffected by the epigenetic changes caused by ddm1. Most insertions were located preferentially along the chromosome arms (Fig. 3a) and they were strongly enriched near or within genes in the case of VANDAL21 and ATCOPIA93 (Fig. 3b). In addition, while known or predicted essential genes29 were targeted as frequently as other protein coding genes by VANDAL21, they were underrepresented among ATENSPM3 and ATCOPIA93 gene targets (Fig. 3c), suggesting that these last two TEs either avoid non-essential genes or tend to produce strong deleterious effects, or both. Consistent with the first possibility, hemizygous insertions within essential genes were also underrepresented, a result not expected if these insertions created recessive, loss-of-function alleles. For ATCOPIA93, strong deleterious effects were also manifest, as almost none of the insertions within essential genes were in the homozygous state. The situation was different for ATENSPM3 insertions within essential genes, which were more often in the homozygous than the hemizygous state (Supplementary Fig. 4b). This last observation suggested that ATENSPM3, which excised frequently in the epiRILs, produced stronger deleterious effects upon excision than upon integration. Indeed, none of 98 homozygous ATENSPM3 excision footprints in the epiRILs were located within essential genes (Supplementary Fig. 4b). Why ATENSPM3 should be more deleterious upon excision remains to be determined.

Fig. 3 TEs exhibit strong and diverse chromatin-associated insertion biases towards genes. a Circos representation of private TE insertions detected for VANDAL21, ATENSPM3, and ATCOPIA93 within wild-type intervals. Centromere positions are indicated by black dots. b Fraction of TE insertions in genes, TEs and intergenic regions. c Fraction of essential genes among genes targeted by VANDAL21, ATENSPM3 or ATCOPIA93 in the epiRILs. Statistical significance for each comparison was obtained using the Chi-square test. d Metagene analysis showing the distribution of new insertions. e Relative abundance of insertion sites in relation to the nine chromatin states defined in A. thaliana. f Levels of DNAse hypersensitivity (DH; top panel), H3K7me3 (middle panel), and H2A.Z (bottom panel) around VANDAL21, ATENSPM3, and ATCOPIA93 insertion sites, respectively. Source data are provided as a Source Data file Full size image

TEs targets have specific chromatin signatures

Although VANDAL21, ATENSPM3, and ATCOPIA93 inserted preferentially near or within genes, their integration patterns differed markedly in relation to several features. Thus, VANDAL21 mainly targeted the promoters and 5’ UTRs of broadly active genes, which are associated with two of the nine chromatin states defined in A. thaliana30 (states 1 and 2, Fig. 3d, e). These two states are enriched in the active marks H3K4me3 and H3K36me3. In contrast, ATENSPM3 and ATCOPIA93 preferentially targeted genes that tend to be repressed and enriched throughout their body in the histone mark H3K27me3, as well as the histone variant H2A.Z (chromatin state 5), which confers reduced stability to nucleosomes31. Furthermore, ATCOPIA93 insertions were also overrepresented in genes whose body is solely enriched in H2A.Z (chromatin state 6, Fig. 3e). Meta-analyses confirmed these findings and indicated that VANDAL21 insertion sites coincided with local peaks of DNase I hypersensitivity (Fig. 3f) that correspond to the transcriptional start site (TSS) of genes (Supplementary Fig. 4c), in keeping with previous observations32. In contrast, ATENSPM3 and ATCOPIA93 insertion sites were more broadly distributed over genes (Fig. 3d) with a marked preference for exons in the case of ATCOPIA93. Insertion sites for this TE were also characterized by local maxima of nucleosomal occupancy and H2A.Z enrichment (Supplementary Fig. 4d-f). We conclude that the integration patterns for VANDAL21, ATENSPM3, and ATCOPIA93 are highly skewed towards distinct sets of genes and associated with specific chromatin states.

H2A.Z guides integration of Ty1/copia

To gain further mechanistic insights into TE insertion preferences, we focused on the association between H2A.Z and ATCOPIA93 insertion sites. In A. thaliana, when DNA methylation is compromised, hypomethylated TE sequences tend to acquire arrays of H2A.Z-containing nucleosomes33 and in the epiRILs, annotated TE sequences were more often hit by ATCOPIA93 when inherited from the ddm1 parent (Supplementary Fig. 4g). The relocalization of ATCOPIA93 insertions towards hypomethylated TE sequences provided a first indication that chromatin enriched in H2A.Z may play a role in the integration preference of ATCOPIA93.

To test causality directly, we used one epiRIL to introduce through crosses several active ATCOPIA93 copies into mutant plants that lack most H2A.Z34 (hta9 hta11 genotype) as well as into wild-type plants (Fig. 4a). Heritable insertions produced by the introgressed ATCOPIA93 copies were identified by TE-sequence capture12 in 1000 seedlings derived from two individuals of each genotype. Consistent with the transposition rate estimated in the epiRILs and with heritable insertions being produced late during flower development20, a total of 2354 and 2589 insertions were identified in the two pools of wild-type seedlings. In contrast, we recovered only 902 and 1264 insertions in pools of mutant seedlings (Fig. 4b), indicating that H2A.Z is required for ATCOPIA93 mobilization. Furthermore, the strong ATCOPIA93 integration preferences observed in the wild-type intervals of the epiRILs, as well as in the wild-type seedlings, were fully abolished in the mutant seedlings (Fig. 4c and Supplementary Fig. 5a). As a result of the more uniform transposition landscape obtained in mutant seedlings, there were almost three times as many essential genes targeted by ATCOPIA93 in these plants compared to the wild-type (Fig. 4b). Together, these results reveal that by targeting H2A.Z-containing nucleosome arrays, ATCOPIA93 avoids essential genes and thus minimizes the risks of being lost together with its host.

Fig. 4 H2A.Z guides the integration of Ty1/Copia retrotransposons. a Experimental strategy for determining the role of H2A.Z in the integration of ATCOPIA93. b Number of new ATCOPIA93 insertions detected in HTA9 HTA11 and hta9 hta11 F3 seedlings (top). Fraction of essential genes among those targeted by ATCOPIA93 (bottom). Statistically significant differences were calculated using the chi-square test. c Meta-analysis of levels of H2A.Z levels around ATCOPIA93 insertion sites. d Density of ATCOPIA93 insertions over well positioned nucleosomes. e Phylogenetic analysis of Tos17, Ty1 and 110A. thaliana Ty1/Copia LTR-retrotransposons. f Experimental strategy for studying transposition of the heat-responsive ATCOPIA78 LTR-retrotransposon in A. thaliana. g Number of new ATCOPIA78 insertions detected in F1 seedlings of nrpd1 plants grown under control or heat-stress conditions. h Meta-analysis of A. thaliana, rice (O. sativa) and yeast (S. cerevisiae) H2A.Z levels around ATCOPIA78, Tos17, and Ty1 insertion sites, respectively. For A. thaliana, experimental and natural insertions are indicated by solid and dotted lines, respectively. Source data of Figure panels. b, d, g, and h are provided as a Source Data file Full size image

The large number of ATCOPIA93 insertions detected in wild-type seedlings enabled us to examine potential integration preferences at the sub-nucleosomal level. Using well-positioned nucleosomes35, a major peak of integration was observed ~55 bp away from the nucleosome dyad (Fig. 4d). This position is one of the main points of contact between DNA and H2A or H2A.Z and it is also where these two histones diverge by several key amino acids36, thus providing further support for a direct role of H2A.Z in guiding ATCOPIA93 integration.

To determine if H2A.Z plays a similar guiding role for other TEs of the Ty1/copia superfamily, we examined ATCOPIA78 (also known as ONSEN37), which is distantly related to ATCOPIA93 (Fig. 4e). There were no private ATCOPIA78 insertions in the epiRILs, but previous work showed that ATCOPIA78 can be transcriptionally reactivated by heat stress and also mobilized in stressed plants defective in so-called RNA-directed DNA methylation, such as in the mutant nrpd137. We therefore assessed the mobilization of ATCOPIA78 by TE sequence capture, using genomic DNA extracted from the progeny of pools of nrpd1 plants that were subjected to heat stress at the seedling stage (Fig. 4f). A total of 279 new insertions were recovered in this way (Fig. 4g), which revealed targeting preferences similar to those of ATCOPIA93 (Fig. 4h and Supplementary Fig. 5b-d). Furthermore, we previously identified 147 recent (i.e., low frequency, non-reference) ATCOPIA78 insertions in the genome sequence of 211 natural A. thaliana accessions12 and the distribution of these natural insertions resembled that determined experimentally for ATCOPIA78, including the preference for H2A.Z containing chromatin (Fig. 4h). We also examined the integration patterns of Ty1/Copia retrotransposons in relation to H2A.Z in the distantly related plant species O. sativa (rice)38, as well as in the yeast S. cerevisiae39 and found that in these two species, experimentally induced insertions were also located at sites enriched in H2A.Z (Fig. 4h). Taken together, these results suggest that the role of H2A.Z in the integration of Ty1/Copia retrotransposons has been evolutionarily conserved since the last common ancestor of plants and fungi.

ATCOPIA preferentially target environmentally responsive genes

The deposition of H2A.Z within gene bodies is specific to plants and predominantly concerns environmentally responsive genes40, which were substantially overrepresented among ATCOPIA93 and ATCOPIA78 targets (Fig. 5a). To investigate the phenotypic impact of such targeting, we performed RNA-seq on pools of seedlings at generation F8 and F16 for one epiRIL (epiRIL 394) that contained five homozygous ATCOPIA93 insertions at generation F8 within genes, including four that are environmentally responsive. Specifically, one insertion was within the second exon of the gene AT5G38940 involved in salt stress and a second insertion was within the 5’UTR of the gene ADR1 (AT1G33560), which encodes an NBS-LRR disease resistance protein. The remaining two insertions were intronic and affected genes FPS2 (AT4G17190) and SPS1 (AT5G20280), which are implicated in defense against aphids41 and nectar secretion42, respectively. RNA-seq data were compared with those obtained for wild-type as well as ddm1 seedlings and we found that all four genic insertions were associated with reduced transcript levels of their target at F8 (Fig. 5b). However, results differed for generation F16, when epiRIL 394 contained a total of 40 methylated copies of ATCOPIA93 per diploid genome (i.e., 19 distinct new insertions in the homozygous state in addition to EVD; Supplementary Fig. 3b). While expression levels were reduced further for the 5’UTR insertion within ADR1, the transcript truncations associated with the two intronic insertions were fully abolished and both genes regained normal expression (Fig. 5b). These observations demonstrate that the effects of genic ATCOPIA93 insertions can differ substantially depending on their precise location and epigenetic status. To gain further insight into potential phenotypic consequences, we examined nectar secretion, an adaptive trait that can be readily assessed in the laboratory and that could influence the degree of outcrossing in natural A. thaliana populations43. In agreement with the molecular data, nectar droplets were more conspicuous in flowers of epiRIL394 at generation F17 than at generation F9 (Fig. 5c).

Fig. 5 ATCOPIA preferentially targets environmentally responsive genes. a GO term analysis of genes with ATCOPIA93 or ATCOPIA78 insertions in the epiRILs or other experimental settings, or in nature. b Genome browser view of RNA-seq coverage of three genes hit by ATCOPIA93 in epiRIL394. c. Nectar secretion in flowers of epiRIL394 at F9 and F17. A nectar droplet is only observed at F17 (circle). d Structure of the FLC locus with the position of the ATCOPIA78 insertion in accession Ag-0 (top). Relative expression level of FLC at 10 and 60 days after germination (DAG) in plants containing (red) or lacking (blue) the ATCOPIA78 insertion and grown under control conditions (ctl.) or subjected to heat stress (HS) at the seedling stage (bottom). Data are mean ± s.d. (n > 9 independent samples, one biological experiment) and statistical significance for differences was obtained using the MWU test. e. Flowering time (mean ± s.d.; n > 9 independent samples, two independent experiments) for the same plants as in d. f Monthly mean temperature at collection sites for Ag-0 and 25 accessions sharing the same FLC haplotype but lacking the ATCOPIA78 insertion. Source data of Figure panels d and e are provided as a Source Data file Full size image

We last investigated a natural ATCOPIA78 insertion located within the first intron of FLC (Fig. 5d), which encodes a key repressor of flowering and is the main genetic determinant of flowering time variation between accessions44. This insertion is present in Ag-0 but absent from most other accessions examined12 (Supplementary Fig. 6), suggesting that it was generated in the recent past. We grew Ag-0 along with two other accessions (Yo-0 and Gre-0) with the same FLC haplotype, which contains a common polymorphism associated with vernalization response45, but lacking the ATCOPIA78 insertion. All three accessions were late flowering (>80 days after germination, DAG) and had similar FLC expression under standard growth conditions. However, when subjected to heat stress at the seedling stage, Ag-0 flowered much earlier (49 DAG), unlike Yo-0 and Gre-0, which remained late flowering (Fig. 5e). Consistent with these observations, FLC expression remained unchanged in these two accessions following heat stress but was markedly reduced in Ag-0 (Fig. 5d), presumably as a result of the transient epigenetic reactivation of ATCOPIA78. Whether the stable silencing of FLC relies on the PRC2-based memory system implicated in the vernalization response46 or on other mechanisms remains to be determined. This notwithstanding, the fact that Ag-0 was collected from a site (Southwest of France) with much warmer winters than the other accessions that share the same FLC haplotype (Fig. 5f), suggests a role for the ATCOPIA78 insertion in the local adaptation of Ag-0 to its environment, by enabling it to flower early even in the absence of vernalization.