In the present paper, we are asking a fairly simple but very controversial question: Are the many highly publicized claims for LGT from prokaryotes to eukaryotes real, or are they artifacts stemming from some combination of (1) genome sequencing contaminations, (2) annotation practice, (3) phylogenetic reconstruction, (4) the underappreciated role of differential gene loss in eukaryote genome evolution, or (5) a combination of the above? Microbiologists have long known about the existence of LGT among prokaryotes [13] and furthermore anticipated the existence of pangenomes in that they built up to 30 % difference in gene content into the prokaryote species definition [70]. Genome sequences, however, have uncovered an extent of LGT among prokaryotes that no one really anticipated. For example, the current estimates for the pangenome size of a single species, Escherichia coli, based on 2085 sequenced strains, are now at 90,000 genes and still climbing, linearly [71]. No mechanism other than LGT will produce pangenomes of that size, and the basic concept of LGT among prokaryotes has never been controversial, because it is a natural process and meshes well with what we know about prokaryote biology.

So if we look back to 1998, when the first evidence for substantial LGT from genome sequence analyses was emerging [59], we can now be absolutely certain: Yes, there can be no doubt that LGT in prokaryotes is real, that it is ongoing, and that it reflects a very important aspect of prokaryote biology: natural variation through recombination. At the same time, endosymbiotic theory has always stated that many genes entered the eukaryotic lineage via the endosymbiotic ancestors of mitochondria and chloroplasts; of this we can also be certain [42, 50, 66, 72]. The basic concept of endosymbiotic gene transfer [73] has also never been controversial, because it is a natural process and meshes well with what we know about eukaryote biology.

The aspect of LGT that has been controversial — but perhaps not controversial enough in our view — concerns claims for outright LGT from prokaryotes to eukaryotes outside the context of endosymbiosis. Such claims were put forth in the human genome sequence [4], and they were promptly refuted as artifacts [5, 6]. New claims for prokaryote-to-eukaryote LGT soon emerged, they became popularized by LGT proponents [58], and soon thereafter many or most eukaryotic genome sequences published in high-profile journals contained reports (or claims) for more LGT [7, 54, 55]. Claims for LGT from chlamydiae to the plant lineage [47, 54, 74, 75] have been repeatedly published, but also repeatedly tested and rejected [50, 76–80], and the same claims have been advanced again recently [81], ignoring the many tests [50, 76–80] that refuted such claims, as if LGT claims are somehow immune to scientific testing. Patchy gene distributions in eukaryotes are also often interpreted as evidence for LGT [45], without even considering the alternative: differential loss [50]. The high tide of prokaryote-to-eukaryote LGT claims might have been reached with the tardigrade showdown, where one group reported that 16.1 % of all tardigrade nuclear genes are recent LGTs from prokaryotes [7], while a separate study found almost none at all [8].

If the claims from individual genome sequences for prokaryote-to-eukaryote LGT are real, then it means that eukaryotes have indeed been continuously acquiring genes from prokaryotes over evolutionary time. That in turn predicts that we should then see two fundamental patterns in investigations of eukaryotic genome sequences. First, different lineages of eukaryotes should possess fundamentally different collections of prokaryote-derived genes, just as we see in prokaryotes [11, 12, 30]. Second, eukaryotic genomes should harbor evidence for recently acquired prokaryotic genes, in addition to the anciently acquired genes that entered eukaryote genomes at the origin of mitochondria and plastids.

Few tests of either prediction have been reported. The obvious test for the first prediction (lineage-specific gene acquisitions) is simple: If we investigate gene presence and absence across many different eukaryotic lineages, then genes that eukaryotes share with prokaryotes should reveal patterns of lineage-specific acquisition. But the converse is observed: The only evidence for lineage-specific gene acquisition in eukaryotes is the mass introduction of bacterial genes in the plant lineage corresponding to the origin of plastids and their subsequent spread during secondary symbiosis [50]. Lineage-specific gene losses in eukaryotes are, by contrast, very common [50].

The 70 % rule

A thorough test of the second prediction (evidence for recent and ancient gene acquisitions) has been lacking. If eukaryotes are acquiring genes from prokaryotes continuously during evolution, then eukaryotic genomes should reveal evidence for recent acquisitions. Here we sought such evidence. We find that prokaryotes do indeed acquire genes from outside their phylum continuously during evolution, while eukaryotes do not. Prokaryotic phyla show a typical pattern of recent acquisitions that show up to 100 % amino acid sequence identity to their sister-group homologs (Fig. 2). The only examples of such high amino acid sequence identity between prokaryotic and eukaryotic genes are restricted to singleton clades, such as E2190_B358_A1066_1 and E2268_B77_0 from Nematostella (Additional file 9: Table S6), which is known to harbor many contaminations [68, 82]. There are a few proteins in plastid-bearing eukaryotes that exhibit >80 % amino acid sequence identity to prokaryotic homologs, but these are mostly involved in photosynthetic functions; they are acquisitions that correspond to the origin of plastids (Additional file 6: Table S4).

If we look among the 2386 clades of non-plastid origin, only very few proteins, such as mitochondrial ATPase, an acquisition corresponding to mitochondrial origin, have ≥70 % amino acid sequence identity among proteins present in more than one eukaryotic genome. All other eukaryotic protein sequences showing ≥70 % amino acid sequence identity to prokaryotic homologs are either (1) acquisitions from the plastid ancestor or (2) contaminations. Genes shared by prokaryotes and only one eukaryotic genome are suspects for contamination anyway. In the present study, we have queried 2386 sequence comparisons, such that the paucity or absence of pairwise identity ≥70 % between clades of eukaryotic proteins present in more than one genome and homologs from prokaryotic sister group clades might be rather general. We call it the 70 % rule.

Sampling and rates?

Critics might wonder about possible effects of uneven sampling in our present investigation. The prokaryotic groups examined have many dozens of species in each case (ranging from 31 to 135; Additional file 4: Table S2), and there are several dozen eukaryotes, too (55 species). Recalling that Fig. 2 shows the results for the comparison of sequences from a given prokaryotic group to the sister group sequence(s) from other taxa, we see a continuum reaching up to >90 % and sometimes 100 % average identity, reflecting continuous recent acquisitions. Compared to the same prokaryotic groups, the 55 eukaryotes top out at 70 % — the corresponding evidence for recent LGTs does not exist. Thus, the nature of the comparisons takes the somewhat uneven sampling into account. Critics might also wonder whether genes are constantly flowing from prokaryotes into eukaryotic genomes, but undergoing rapid evolution once they arrive so as to conform to the 70 % rule. That is a special plea, but we can exclude it nonetheless. Were that true, then different groups of eukaryotes would have fundamentally different collections of prokaryotic genes, but that possibility has already been tested and it is not the case: Eukaryotes possess different subsets of one and the same set of prokaryotic genes, which was present in the eukaryote common ancestor [50]. Critics might also offer that the eukaryotic genes are so divergent from their prokaryotic sisters because we do not know (or have not sampled) prokaryotic lineages closely related to the donors. But Fig. 2 shows that for the same sample of genes, we do see the donors in prokaryotes; that is, we find many sequences having >70 % identity to sisters from outside the phylum. Hence the prokaryotic sample cannot be the problem.

The last one out…

If lineage-specific acquisitions are extremely rare in eukaryotes, as the present data indicate, how can one explain the presence of lineage-specific genes that are present in more than one genome? There are two ways to explain sparse gene distribution patterns: lineage specific acquisition or differential loss. If a gene is lost in one lineage, that means that it cannot be essential, hence it is possible for it to be lost in other lineages as well. Furthermore, loss is an irreversible process — genes lost in one lineage will be missing in all descendants. If genes are indeed undergoing widespread loss in eukaryotes, as recent studies indicate [50, 83], it follows that some genes will have been lost in all lineages but one. Such genes (present only in one group) will have typical eukaryotic attributes, such as normal promoters and introns, and like other eukaryotic genes of prokaryotic origin they will be distantly related to their prokaryotic homologs, but they will be lineage-specific (but not genome-specific, like singleton contaminations).

This is exactly what is observed for genes that were interpreted as evidence for LGT in the Galdieria sulphuraria genome [55], a genome with claims for abundant LGT [84]. Whereas Richards and Monier [84] remain receptive to the claim for an LGT origin of 5 % of the genes in Galdieria [55], they do not mention the possibility of differential loss to explain this curious gene presence pattern. We consider it likely that those Galdieria genes are the result of differential loss in other genomes. After all, if a gene can be lost in one lineage, it can be lost in other lineages as well, and in the last lineage to retain the gene it will look in terms of gene distribution all the world like an LGT, but it will conform to the 70 % rule. In differential loss, the last one out looks like an LGT.