Significance Tardigrades, also known as moss piglets or water bears, are renowned for their ability to withstand extreme environmental challenges. A recently published analysis of the genome of the tardigrade Hypsibius dujardini by Boothby et al. concluded that horizontal acquisition of genes from bacterial and other sources might be key to cryptobiosis in tardigrades. We independently sequenced the genome of H. dujardini and detected a low level of horizontal gene transfer. We show that the extensive horizontal transfer proposed by Boothby et al. was an artifact of a failure to eliminate contaminants from sequence data before assembly.

Abstract Tardigrades are meiofaunal ecdysozoans that are key to understanding the origins of Arthropoda. Many species of Tardigrada can survive extreme conditions through cryptobiosis. In a recent paper [Boothby TC, et al. (2015) Proc Natl Acad Sci USA 112(52):15976–15981], the authors concluded that the tardigrade Hypsibius dujardini had an unprecedented proportion (17%) of genes originating through functional horizontal gene transfer (fHGT) and speculated that fHGT was likely formative in the evolution of cryptobiosis. We independently sequenced the genome of H. dujardini. As expected from whole-organism DNA sampling, our raw data contained reads from nontarget genomes. Filtering using metagenomics approaches generated a draft H. dujardini genome assembly of 135 Mb with superior assembly metrics to the previously published assembly. Additional microbial contamination likely remains. We found no support for extensive fHGT. Among 23,021 gene predictions we identified 0.2% strong candidates for fHGT from bacteria and 0.2% strong candidates for fHGT from nonmetazoan eukaryotes. Cross-comparison of assemblies showed that the overwhelming majority of HGT candidates in the Boothby et al. genome derived from contaminants. We conclude that fHGT into H. dujardini accounts for at most 1–2% of genes and that the proposal that one-sixth of tardigrade genes originate from functional HGT events is an artifact of undetected contamination.

Tardigrades are a neglected phylum of endearing animals, also known as water bears or moss piglets (1). They are members of the superphylum Ecdysozoa (2) and sisters to Onychophora and Arthropoda (3, 4). There are about 800 described species (1), although many more are likely to be as yet undescribed (5). All are small (tardigrades are usually classified in the meiofauna) and are found in sediments and on vegetation from the Antarctic to the Arctic, from mountain ranges to the deep sea, and in marine and fresh water environments. Their dispersal may be associated with the ability of many (but not all) species to enter cryptobiosis, losing almost all body water, and resisting extremes of temperature, pressure, and desiccation (6⇓⇓–9), deep space vacuum (10), and irradiation (11). Interest in tardigrades focuses on their utility as environmental and biogeographic markers, the insight their cryptobiotic mechanisms may yield for biotechnology and medicine, and exploration of their development compared with other Ecdysozoa, especially Nematoda and Arthropoda.

Hypsibius dujardini (Doyère, 1840) is a limnetic tardigrade that is an emerging model for evolutionary developmental biology (4, 12⇓⇓⇓⇓⇓⇓⇓⇓–21). It is easily cultured in the laboratory, is largely see-through (aiding analyses of development and anatomy; SI Appendix, Fig. S1), and has a rapid life cycle. H. dujardini is a parthenogen, with first division restitution of ploidy (22) and therefore is intractable for traditional genetic analysis, although reverse genetic approaches are being developed (17). H. dujardini has become a genomic model system, revealing the pattern of ecdysozoan phylogeny (3, 4) and the evolution of small RNA pathways (23). H. dujardini is poorly cryptobiotic (24), but serves as a useful comparator for good cryptobiotic species (9).

Animal genomes can accrete horizontally transferred DNA, especially from germ line-transmitted symbionts (25), but the majority of transfers are nonfunctional and subsequently evolve neutrally and can be characterized as dead-on-arrival horizontal gene transfer (doaHGT) (25⇓–27). Functional horizontal gene transfer (fHGT) can bring to a recipient genome new biochemical capacities and contrasts with gradualist evolution of endogenous genes to new function. The bdelloid rotifers Adineta vaga (28) and Adineta ricciae (29) have high levels of fHGT (∼8%), and this has been associated with both their survival as phylogenetically ancient asexuals and their ability to undergo cryptobiosis (28⇓⇓⇓–32). Different kinds of evidence are required to support claims of doaHGT compared with fHGT. Both are supported by phylogenetic proof of foreignness, linkage to known host genome-resident genes, in situ proof of presence on nuclear chromosomes (33), Mendelian inheritance (34), and phylogenetic perdurance (presence in all, or many individuals of a species, and presence in related taxa). Functional integration of a foreign gene into an animal genome requires adaptation to the new transcriptional environment including acquisition of spliceosomal introns, acclimatization to host base composition and codon use bias, and evidence of active transcription (e.g., in mRNA sequencing data) (35, 36).

Another source of foreign sequence in genome assemblies is contamination, which is easy to generate and difficult to separate. Genomic sequencing of small target organisms requires the pooling of many individuals, and thus also of their associated microbiota, including gut, adherent, and infectious organisms. Contaminants negatively affect assembly in a number of ways (37) and generate scaffolds that compromise downstream analyses. Cleaned datasets result in better assemblies (38, 39), but care must be taken not to accidentally eliminate true HGT fragments.

A recent study based on de novo genome sequencing of H. dujardini came to the startling conclusion that 17% of this species’ genes arose by fHGT from nonmetazoan taxa (13). Surveys of published genomes have revealed many cases of HGT (40), but the degree of fHGT claimed for H. dujardini would challenge accepted notions of the phylogenetic independence of animal genomes and general assumptions that animal evolution is a tree-like process. The reported H. dujardini fHGT gene set included functions associated with stress resistance and a link to cryptobiosis was proposed (13). Given the potential challenge to accepted notions of the integrity and phylogenetic independence of animal genomes, this claim (13) requires strong experimental support. Here we present analyses of the evidence presented, including comparison with an independently generated assembly from the same H. dujardini strain, using approaches designed for low-complexity metagenomic and meiofaunal genome projects (38, 39). We found no evidence for extensive functional horizontal gene transfer into the genome of H. dujardini.

Conclusions We generated a good draft genome for the model tardigrade H. dujardini. We identified areas for improvement of our assembly, particularly removal of remaining contaminant-derived sequences. We approached the data as a low complexity metagenomic project, and this methodology is going to be ever more important as genomics are used on systems difficult to culture and isolate. The blobtools package (38, 39) and related toolkits such as Anvi’o (48) promise to ease the significant technical problem of separating target genomes from those of other species. Analyses of gene content and the phylogenetic position of H. dujardini and by inference Tardigrada are at an early stage, but are already yielding useful insights. Early, open release of the data has been key. The H. dujardini ESTs have been used for deep phylogeny analyses that place Tardigrada in Panarthropoda (3, 4), identification of a P2X receptor with an intriguing mix of electrophysiological properties (16), and for exploration of cryptobiosis in other tardigrade species (7, 8). The nHd.2.3 assembly was used for identification of opsin loci in H. dujardini (12). Our assembly of the H. dujardini genome conflicts with the published UNC draft genome (13), despite being from the same original stock culture of H. dujardini. Our assembly had superior assembly and biological quality statistics but was ∼120 Mb shorter than UNC. About 70 Mb of the UNC assembly most likely derived from the genomes of several bacterial contaminants. The disparity between the noncontaminant span of the UNC assembly (∼180 Mb), our estimate of the genome (∼130 Mb), and direct densitometry estimates (80–110 Mb) may result from the presence of uncollapsed haploid segments. Resolution of this issue awaits careful reassembly. We predict a hugely reduced impact of predicted functional HGT: 0.2–0.9% of genes from nHd.2.3 had signatures of fHGT from bacteria, a relatively unsurprising figure. fHGT from nonmetazoan eukaryotes into H. dujardini was less easily validated, but likely comprised a maximum of 0.2%. In Caenorhabditis elegans, Drosophila melanogaster, and primates, validated bacterial fHGT loci comprise 0.8%, 0.3%, and 0.5% of genes, respectively (40). These mature estimates, from well-assembled genomes, are reduced compared with early guesses, such as the proposal that 1% of human genes originated through fHGT (49, 50). mRNA-Seq mapping shows that filtering did not compromise the assembly by eliminating bona fide tardigrade sequence. Although some UNC fHGT candidates were confirmed, our analyses show that the UNC assembly is heavily compromised by sequences that derive from bacterial and other contaminants and that the vast majority of the proposed fHGT candidates are artifactual.

Experimental Procedures Genome Assembly and Comparison with UNC Assembly of H. dujardini. The H. dujardini nHd.2.3 genome was assembled from Illumina short-insert and mate-pair data. We compared our assembly and that of Boothby et al. (13) by mapping raw read data and exploring patterns of coverage and GC% in blobtools (drl.github.io/blobtools/) (38, 39) and exploring sequence similarity with BLAST and diamond. Details can be found in SI Appendix. Availability of Supporting Data. Raw sequence read data have been deposited in the Short Read Archive, database of Genome Survey Sequences, and database of Expressed Sequence Tags (SI Appendix, Table S4). Edinburgh genome assemblies have not been deposited in ENA, as we have no wish to contaminate the public databases with foreign genes mistakenly labeled as “tardigrade.” Assemblies (including GFF files and transcript and protein predictions) are available at www.tardigrades.org and dx.doi.org/10.5281/zenodo.45436. Code used in the analyses is available from https://github.com/drl/tardigrade and https://github.com/sujaikumar/tardigrade. Note Added in Proof. T. Delmont and M. Eren have also reanalyzed the UNC (and our) data using Anvi'o and come to similar conclusions concerning contamination (51).

Acknowledgments We thank Bob McNuff (Sciento) for inspired culturing of H. dujardini and both reviewers who proposed changes that made this manuscript clearer. We especially thank a wide community of colleagues on Twitter, blogs, and email for discussion of the results presented here (which were posted on bioRxiv for discussion: dx.doi.org/10.1101/033464) in the weeks since the publication of the University of North Carolina genome. The Edinburgh tardigrade project was funded by Biotechnology and Biological Sciences Research Council (BBSRC) Grant 15/COD17089. G.K. was funded by a BBSRC PhD studentship. D.R.L. is funded by a James Hutton Institute/School of Biological Sciences University of Edinburgh studentship. S.K. was funded by an international studentship and is currently funded by BBSRC Award BB/K020161/1. L.S. is funded by a Baillie Gifford Studentship, University of Edinburgh.

Footnotes Author contributions: G.K., S.K., D.R.L., J.D., C.C., H.M., F.T., A.A.A., and M.B. designed research; G.K., S.K., D.R.L., L.S., J.D., C.C., H.M., F.T., A.A.A., and M.B. performed research; G.K., S.K., D.R.L., L.S., J.D., C.C., H.M., F.T., A.A.A., and M.B. analyzed data; and G.K., S.K., D.R.L., L.S., A.A.A., and M.B. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: The sequences reported in this paper have been deposited in the GenBank database (accession nos. CD449043–CD449952, CF075629–CF076100, CF544107–CF544792, CK325778–CK326974, CO501844–CO508720, CO741093–CO742088, CZ257545–CZ258607, and ERR1147177) and the European Nucleotide Archive (accession no. ERR1147178).

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600338113/-/DCSupplemental.