Reviewer's report 1

Eugene V. Koonin, National Center for Biotechnology Information, NIH, Bethesda Maryland, USA

What determines the total size of genomes and their effective complexity (sensu Adami) and how did genome size evolve throughout life's evolution are genuinely exciting and fundamental biological issues. Potentially, a lot of information can be extracted from comparative analysis of genome size and complexity. This paper is an attempt to cast this analysis in the simplest possible terms, i.e., to back-extrapolate the maximum genome size attained on earth at different times (I believe this is what is being used to produce the plot in Fig. 1; the corresponding language in the paper is not very precise) to the origin of the first organisms. The inferred dates for the origin of life are very early and, under a straightforward interpretation favored by the author, suggest that life did not begin on earth but rather elsewhere in the Universe some 10 billion years ago, after which it spread by panspermia.

I am not at all a priori prejudiced against the panspermia hypothesis and actually agree with the author's concluding sentence in that panspermia should be considered "on equal basis with alternative hypotheses of de-novo life origin on earth". However, I think that the approach used in this work provides no support for an early date of life's origin. The main problem, as I see it, lies with the fact that the key plot in Fig. 1 combines two worlds with very different evolutionary trends, the prokaryotes and the eukaryotes (especially, complex, multicellular eukaryotes). The exponential law very well might hold for the portion of the curve that corresponds to complex eukaryotes (or, possibly, eukaryotes in general), and the reasons why this is so would be interesting to discuss in some depth (more data points would be required, though). The problem is, however, that, for the first 1.5–2 billion years of life's evolution on this planet, all existing life forms were prokaryotes. There is just one point corresponding to prokaryotes in Fig. 1, and there is, indeed, an excellent reason for that: we have no evidence whatsoever that the maximum genome size of prokaryotes increased during that enormous time span or in the time elapsed since.

Author's reply (1)

I have addressed this problem in discussion by estimating the average rate of increase in genome complexity in Archaea and Eubacteria which appear lower than the rate of complexity increase in eukaryotes. Then I discuss 2 possible scenarios: (a) initial rates of complexity increase in prokaryotes were similar to those observed in eukaryotes and then slowed down due to organization constraints, or (b) rates of complexity increase in prokaryotes were always slower than in eukaryotes. With scenario (a), the expected origin of life is ca. 10 billion years ago according to regression (Fig. 1), and with scenario (b), life originated even earlier than that. Thus, separate handling of prokaryotes and eukaryotes does not bring the predicted date of life origin closer to present.

For all we know, the characteristic complexity of the prokaryotic genomes had been reached very early on during life's evolution (considering the geochemical and paleontological evidence of more or less modern-like microbiota ~3.5 billion years ago) and remained in equilibrium ever since. Thus, to the best of our understanding, there was an early explosive phase of evolution of complexity, which was followed by stasis (the prokaryotic phase of life's history) and then by another burst associated with eukaryogenesis. The authors dismisses, very lightly, the notion of punctuated equilibrium. This is not the place to assess the validity of the specific theory of Gould and Eldredge (it might indeed have its problems), however, I believe that, in general, major non-uniformity of the tempo of life's evolution cannot be denied.

Author's reply (2)

If the rate of evolution is measured by numerical expansion of some taxonomic groups and numerical decline of other groups, then it is definitely non-uniform. However, in the paper I discuss the rate of increase in genome complexity which is an entirely different process. So far there is no evidence that the rate of complexity increase fluctuated considerably over time. In particular, there is no evidence of "early explosive phase of evolution" of prokaryotes and "another burst associated with eukaryogenesis". Genome complexity can increase even if direct adaptations to the environment remain stable (due to increasing reliability, modularity, and adaptability).

In the general epistemological sense, the approach to back-extrapolation of life's history taken in this paper can be characterized as ultra-uniformitarianism, a wordlview championed by the great geologists Hutton and Lyell and strongly embraced by Darwin (this work even might be considered something of an extension of this view but the spirit is definitely the same). In that vein, I believe that what is done here is an interesting exercise because it showcases the kind of conclusions to which ultra-uniformitarianism can lead. If the entire discussion and conclusions were rewritten along these lines, this could turn into a sound piece.

There are two issues in this paper that are not as germane to its main conclusions as the above but are important and deserve comment because they are not, I believe, adequately addressed. The first issues is the nature on constraints that effect evolution of genome complexity/size. The authors dismisses Lynch and Conery's population-genetic concept of genome complexity evolution (his ref. [12]) by citing the comment of Charlesworth and Barton [13]. This is, I think, disingenious because Charlesworth and Barton's note (regardless of whether or not their arguments are compelling) does not even seek to invalidate Lynch's theory as a whole but rather addresses specific issues of mobile element propagation. I strongly believe that Lynch's concept has a lot going for it and explains an important, if not the central, aspect of these constraints.

Author's reply (3)

I have removed most of my criticism of Lynch and Conery paper because I agree that their data are valid. However I disagree with their evolutionary interpretation, and suggest another interpretation that large N e was one of the constraint in the evolution of prokaryotes.

Another, complementary source of these constraints that is not at all covered is the faster than linear scaling of the number of regulatory genes with genome size (van Nimwegen E. Trends Genet. 2003 Sep;19(9):479–84; Konstantinidis KT, Tiedje JM. Proc Natl Acad Sci U S A. 2004 Mar 2;101(9):3160–5).

Author's reply (4)

I agree that the proportion of regulatory genes may change in evolution. However I don't think that this can substantially affect the regression line which I discuss in the paper.

Another issue is that of the "minimal genome": equating minimal genomes reconstructed by comparative-genomic approaches with ancestral life forms is incorrect and does not reflect the original view of the authors of the minimal genome notion (of which ref. 27 in the present manuscript is a proper reflection).

Author's reply (5)

I have removed the reference to the "minimal genome" paper in the paragraphs where I discuss the complexity of ancestral life forms and the possibility of spontaneous self-assembly of complex systems.

Again, all this is not to claim, with confidence, that the only form of life we are aware of evolved on earth rather than elsewhere in the universe. The latter is quite a possibility. The only claim I am making is that the data analyzed in this paper and, for that matter, any comparative-genomic data I can think of do not provide any evidence in support of an early, extraterrestrial origin of life. Accordingly, I believe that terrestrial origin around 4 billion years ago should be taken as the null hypothesis.

Author's reply (6)

I do not claim to have a proof for the exponential hypothesis, but offer available supporting evidence. In addition, I suggest (a) mechanisms of positive feedback that can cause the exponential increase in genome complexity and (b) possible test for panspermia if life is found on any planets or satellites in the solar system. Testing multiple null hypotheses may appear more productive than testing a single one.

Reviewer's report 2

Chris Adami, Keck Graduate Institute, California Institute of Technology, Pasadena, USA

In this contribution, the author attempts to characterize the functional form of the relationship between the sizes of the functional genome of organisms and their appearance in the fossil record. Using five data points (prokaryotes, eukaryotes, worms, fish, and mammals), the author deduces an exponential increase in functional size with time. He then uses this functional relationship to hypothesize an origin of life that exceeds the age of the Earth by a factor of two. From this he concludes that the origin of life cannot have taken place on Earth, but points towards hypotheses of the panspermia type.

This paper is an example of how not to analyze data. First, there is no doubt that a much more sophisticated analysis of whole genome data can be performed. For example, the author claims that 1/3 of the Fugu rubripes genome is functional (this is one of his datapoints), but the original publication only states that "gene loci occupy about one-third of the genome". There is some evidence that non-coding but functional (likely regulatory) DNA increases with the complexity of the organism (see, e.g., [1]), so that taking just the gene loci into account is very likely to be misleading, more so for complex metazoans.

Author's reply (7)

I believe that my estimate of functional genome size of Fugu rubripes as 1/3 of genome is realistic. Gene loci contain more than coding sequence; they also include introns and untranslated regions. Although I did not explicitly include promoter sequences, they may be of similar size as non-functional portion of introns. This analysis is not sensitive to small variation in functional non-redundant genome size (± 20–30%). This level of uncertainty is inevitable because we do not have an exact quantitative measure on genome complexity.

Even were we to accept the five data points at face value, they would not allow us to reach any conclusion about the origin of life. This is a classical case of "allowing the data to suggest a model". For example, I have a time series of personal Marathon finishing times versus date that very much suggests a linear (decreasing) relationship (with four, rather than five, data points). But I am not so foolish as to predict from these data points the date when I will break the world record (or the speed of sound, or light, for that matter). The authors advance some arguments for their exponential model, but many more arguments speak against it. For example, while an approximately exponential growth could be argued for in any particular period, major changes in organization (for example from unicellular to multicellular) are likely to affect the rate of growth, so that a piecewise exponential would be a more reasonable assumption.

Author's reply (8)

see reply #1 to Eugene Koonin

Even more dramatic, it is inconceivable that life began with just a few nucleotides. Instead, there must have been an initial step–from zero to finite–in the complexity of organisms (as measured by its functional genome). The size of this step will then be crucial in determining the point of origin.

Author's reply (9)

I have added more discussion on why it is more likely that genome evolved gradually from single coding elements (paragraphs 5–7 of Discussion).

But as we have no information about the minimal genome size of living organisms, an extrapolation with a pure exponential simply makes no sense. Thus, while a thorough analysis of the evolution of functional genome size would certainly be welcome, the data presented here do not warrant any conclusion, except perhaps that the size of functional DNA has been increasing in evolution, something we should not be terribly surprised to learn.

Reviewer's report 3

Arcady Mushegian, Stowers Institute, Kansas City, USA

I agree with the Author on the following:

1. If there is evidence supporting panspermy, it should be considered seriously.

2. Panspermy, if it occurred, should not prevent us from attempting to reconstruct ancestral genomes, using comparative genomics and the knowledge of planetary chemistry.

3. Early stages of evolution of Life seem to have been overloaded with evolutionary innovation, which asks for explanations. Panspermy may be one such explanation; periods of accelerated evolution, prompted in part by Lynch-Conery considerations of Ne, is another.

Having said that, I do not see any striking arguments for panspermy in this work. The "genome size as a clock" approach is, in my opinion, qualitatively correct, and it shows what we already knew, i.e., that the earliest stages of life appear to have had precious little time to progress to what are currently our best estimates of genome size and the number of protein-coding genes (on the latter, see also below). Whether the dependency is of the exponential form, however, remains to be seen.

Author's reply (10)

see reply #6 to Eugene Koonin

Discussion of minimal genome in this regard is a red herring. First, the Author misreads what is in the minimal-genome literature (e.g. Mushegian and Koonin, 1997; later reviews both by myself and by Koonin; and experimental work of Hutchison, Smith and Venter, most recently Glass et al., 2006; Pubmed 16407165). Minimal genome is a construct of biochemical engineering, predicted or directly manipulated to sustain life in a rich medium with the smallest number of genes. It is not purported to model the ancestor, even though it, same as the ancestral genomes, may be constructed using methods of comparative genomics, and even though minimal genome may be enriched in ancestral genes. Second, no one ever said that the minimal or ancestral genomes have evolved by spurious assembly of 300 genes – any paper, including our own, that speculates about origins of Life, understands the problem of earlier stages clearly.

Author's reply (11)

see reply #5 to Eugene Koonin

Third and most important, all this is not relevant to Author's own argument: the genome to discuss is not minimal one, but that of LUCA (last universal common ancestor). The latest reconstructions of LUCA gene content, notably Pubmed 12515582 and 16431085, come up with 600–1000 genes, which is in fact even better for the early-overload argument, so why not stick to these estimates?

Author's reply (12)

In this paper I used existing genomes, and LUCA is only mentioned for discussion purposes. Also I tried to make my estimates for predicted life origin as conservative as possible.

Ultimately, the question is not whether "early genomes were way too complex", but, in the likely case that they were, whether panspermy better explains these observations than other hypotheses. I find it counterproductive to dismiss the Lynch-Conery theory in one sentence – at least in the sentence that directs to the Charlesworth-Barton paper, as if it is the last word on the subject. In fact, said paper is rather supportive of many observations and explanations presented by Lynch and Conery, arguing mostly with the idea of subfunctionalization (where Charlesworth's argument is an overly general one, which is understandable: coming up with any specifics here will require a lot of quite subtle analysis of the data that are not there yet) and, in a technically involved way, with the ideas of transposon dynamics (which, I think, are addressed in part by M.Lynch in Pubmed 16280547). If the author has a substantive disagreement with Lynch-Conery, let us hear it, but we haven't yet.

Author's reply (13)

see reply #3 to Eugene Koonin

The "viral hypothesis", in the meantime, exists in many modifications, not all of which require modern-type viruses: see for example, Woese (series of essays in 1998–2002) and Koonin-Martin (Pubmed 16223546). With regards to absolute time scale, however, these theories may not be even that helpful, because the step from these general hypotheses to constant vs variable evolutionary rate would not be trivial.

Author's reply (14)

Even if early viruses were different (e.g., non-parasitic) there is no evidence that their rate of complexity increase was higher than in eukaryotes.