The evolutions of Embryophyta (land plants) and Streptophyta (land plants and their closest algal relatives, Charophyta) are arguably the most dramatic transitions in the history of plants. These events have previously been linked with the expansion of many processes and developmental traits, including embryogenesis [], plant hormones [], and symbiotic interactions with arbuscular mycorrhizae and rhizobacteria []. Our analyses revealed that there was a substantial increase in the number of highly retained gene novelties in the last common ancestor (LCA) of Streptophyta and the LCA of Embryophyta with 50 and 103 novel core HGs identified, respectively ( Figure 1 ). Gene Ontology (GO) analyses using Arabidopsis thaliana, which has comprehensive GO annotations, were used to explore the modern functions of descendants of genes from novel core HGs ( Data S4 Figure 2 ). The protein class category was used, as this classification is less prone to false assignments and biases []. All other GO categories, including molecular function, biological process, and pathway were produced ( Data S4 ). HGs present in the LCA of embryophytes are abundant in classes involved in protein modification (e.g., transferase, oxidoreductase, and ligase) and protein transport (e.g., transporter proteins and membrane traffic proteins), whereas HGs present in the LCA of streptophytes are abundant in gene regulation (e.g., transcription factor) and cell structure, movement, and division (e.g., cytoskeletal proteins). The origins of Streptophyta were accompanied by the evolution of many plant-specific transcription factors (e.g., HD-ZIP) and an increasingly complex cell wall corresponding to the high number of the protein class hits seen in the Streptophyta novel core (NC) HGs [].

Using Arabidopsis thaliana genes as an extant representative, protein classes were assigned for all novel core HGs. All other GO annotations (e.g., molecular function, biological process, cellular component, and pathways) were produced. See also Data S4

18 Van de Peer Y.

Mizrachi E.

Marchal K. The evolutionary significance of polyploidy.

19 Zwaenepoel A.

Van de Peer Y. Inference of ancient whole-genome duplications and the evolution of gene duplication and loss rates.

6 Leebens-Mack J.H.

Barker M.S.

Carpenter E.J.

Deyholos M.K.

Gitzendanner M.A.

Graham S.W.

Grosse I.

Li Z.

Melkonian M.

Mirarab S.

et al. One Thousand Plant Transcriptomes Initiative

One thousand plant transcriptomes and the phylogenomics of green plants.

20 Lutzoni F.

Nowak M.D.

Alfaro M.E.

Reeb V.

Miadlikowska J.

Krug M.

Arnold A.E.

Lewis L.A.

Swofford D.L.

Hibbett D.

et al. Contemporaneous radiations of fungi and plants linked to symbiosis.

21 Yue J.

Hu X.

Sun H.

Yang Y.

Huang J. Widespread impact of horizontal gene transfer on plant colonization of land.

22 Margulis L.

Chapman M.

Guerrero R.

Hall J. The last eukaryotic common ancestor (LECA): acquisition of cytoskeletal motility from aerotolerant spirochetes in the Proterozoic Eon.

21 Yue J.

Hu X.

Sun H.

Yang Y.

Huang J. Widespread impact of horizontal gene transfer on plant colonization of land.

23 Wickell D.A.

Li F. On the evolutionary significance of horizontal gene transfers in plants.

It is possible that the bursts of conserved genomic novelty could be explained by the presence of one or multiple whole-genome duplications (WGDs). Inferring WGDs in these ancestral nodes is difficult with no events currently identified in the LCA of these groups []. Analysis of over 1,000 transcriptomes has identified 244 WGDs across the green plant phylogeny []. These mostly occur after the origin of vascular plants and do not appear to coincide with the bursts of novelty seen in this study. This supports the theory that there was a change in strategy from gene family birth and expansion to WGD along the backbone of the plant phylogeny. Another contributing factor that might explain the origins of some novel core HGs is the presence of horizontal gene transfer (HGT). BLAST searches against the Swissprot database confirmed the absence of all novel core HGs in outgroup taxa, validating the outputs of the pipeline approach (BLAST outputs on Github: https://github.com/AlexanderBowles/Plant-Evomics/tree/master/Extended%20Data ). Queries using the pipeline approach revealed that 323 HGs were present in fungal and land plant genomes but absent in all other taxa in this study’s dataset ( Data S1 ), suggesting widespread HGT in plants []. The last eukaryotic common ancestor (LECA) is the ancestor that connects all eukaryotes, including plants and fungi. Either these HGs were in LECA and lost from all eukaryotic representatives aside from fungi and land plants or they are the product of HGT []. GO analysis of 25 of the HGs that contained at least 100 embryophyte taxa revealed that they were associated with gene regulation and protein modification ( Data S5 ). Other possible HGT events that could explain the marked distribution of these novel core HGs include parasitism by other plants, symbiosis with other plants (e.g., transfer of a photoreceptor gene from bryophytes to ferns), and symbiosis with rhizobacteria [].