About five years ago, biologists were surprised by the first discovery of an extremely large virus. Viruses are generally stripped down, efficient predators, only carrying as much DNA or RNA necessary to hijack their host and make extra copies of themselves. The newly discovered virus, called Mimivirus, was anything but stripped down; it carried a genome nearly the size of some bacterial species. And, instead of simply hijacking its host, the viral genome carried a lot of genes that replaced basic cellular functions, including some involved in DNA repair and the maufacturing of proteins.

The unsual size and gene content of the virus led one scientist to suggest that viruses could explain the origin of DNA-based life. If viruses carried all these genes, then it's possible to imagine that one could set up shop in a cell and simply never leave, gradually taking over the remaining functions once performed by its host's genetic material. This would explain the origin of DNA, which would distinguish the virus from its host's genetic material, a holdover from the RNA world. It could also explain the existence of a distinct nucleus within Eukaryotic cells.

A paper is being released today, however, that argues that this scenario has things exactly backwards. Giant viruses, its authors argue, have all these genes normally associated with cells because, in their distant evolutionary past, they were once cells.

Mimivirus was discovered in an amoeba, so the authors of the new paper used a simple technique to look for its relatives: take three different species of amoeba, expose them to a variety of environmental samples, and see if anything big starts growing in them. They hit pay dirt with a sample obtained from an ocean monitoring station just off the coast of Chile. Despite the oceanic source, the virus grew nicely in fresh water amoebae. The site also gave the virus its name: Megavirus chilensis.

The authors followed its lifestyle, showing that it behaved much like Mimivirus, forming similar structures within its host cell that could only be distinguished using electron microscopy. They also sequenced its entire genome, which turned out to be the largest virus genome yet completed: 1.26 million base pairs of DNA (Megabases). Based on this sequence, Megavirus is a distant cousin of Mimivirus. Of its 1,120 protein-coding genes, over 250 have no equivalent in Mimivirus. But, of the genes that are shared, the sequences average about 50 percent identity on the protein level. This means that Megavirus is similar enough that it can be compared to Mimivirus, but different enough that it's possible to make some inferences about the viruses' evolutionary history.

And what they find supports the view that the virus started out with a much larger complement of genes. For example, Mimivirus has a suite of genes that can help repair DNA. Megavirus has those plus one other that is specialized for the repair of DNA damaged by UV light. The additional gene appears to be functional: Megavirus was able to grow following an exposure to UV that was sufficient to disable Mimivirus.

Both viruses share an identical set of genes involved in transcribing their DNA into RNA, and use an identical set of signals to indicate where the transcripts should start and stop. Mimivirus also contains a number of genes used in the translation of RNA into protein. Megavirus has those plus a few more, including additional genes that attach amino acids (components of proteins) onto RNAs for use in translation.

Clearly, the common genes suggest that the viruses share a common ancestor. This leaves two possibilities for the novel ones: either the ancestral virus had a larger collection and its descendants have lost different ones, or each virus picked up different genes from its hosts through a process called horizontal gene transfer. The authors favor the former explanation, because most of the genes specific to one of the two viruses don't look like any gene present in their hosts (or any other gene we've ever seen, for that matter). This implies that horizontal gene transfer doesn't seem to have done much to shape the viruses' genomes.

So, when did the common ancestor exist? The authors line up a few of the conserved megavirus genes (including those of a more distantly related giant virus, CroV) with the equivalents in other eukaryotic species, and find that they branch off right at the base of the the eukaryotic lineage. In other words, the viruses seem to have had a common ancestor with eukaryotes, but it split off right after the eukaryotes diverged from bacteria and archaea. (This also argues against the horizontal gene transfer idea, since there doesn't seem to be a species out there that the genes could have been transferred from.)

To the authors, this suggests that the viruses are the evolutionary descendants of an ancient, free-living eukaryotic cell. Various genes and structures from that organism have gradually been lost over its long history as a parasite, leaving something that propagates like a virus, but belongs to a distinct lineage from all other viruses that we're aware of.

The authors make a reasonably compelling case against the megaviruses getting their complex genomes via horizontal gene transfer, although it would be good to see a similar analysis for a lot more of the shared genes. What they don't do, however, is rule out the initial alternative: it's still technically possible that the megaviruses and eukaryotes share an ancient common ancestor because all eukaryotes are descendants of the virus' genome. At the moment, I'm not sure it's possible to distinguis between these alternative explanations.

PNAS, 2011. DOI: 10.1073/pnas.1110889108 (About DOIs).