Introduction

In 1977 Carl Woese and George Fox published a 2.5-page paper1 in the Proceedings of the National Academy of Sciences. Their paper had no figures and only a single table, but it would ultimately set the tone for the next 35 years of research on bioinformatics and the evolution of life. The table offered the first gene sequence-based assessment of the relationships between the major kinds of living organisms.2 Woese and Fox proposed that the tree of life is tripartite – i.e. that there are three major groups of organisms: the eubacteria, the archaea and the eukaryotes.

This view of the tree blatantly contradicted contemporary dogma on two fronts. First, existing taxonomies of life assigned organisms either to the eukaryotes – organisms whose cells are subdivided, containing the organelles you learned about in high school like the endoplasmic reticulum, mitochondria and nucleus – or the prokaryotes. Not all eukaryotes are multicellular, but all multicellular organisms are eukaryotes (with minor caveats, of course – there are usually caveats associated with blanket statements in biology). Prokaryotes were then defined as the negation of the eukaryotes, i.e. single-celled organisms containing cells without apparent subdivision. Not only did Woese and Fox provide a positive definition of prokaryotes by showing that prokaryotic gene sequences are readily distinguished from eukaryotic ones, they also showed that the prokaryotes were themselves subdivided into the eubacteria and the archaea. As was evident from their table, the newly discovered archaea are as genetically distinct from the eubacteria as the eubacteria are from eukaryotes.

Second, Woese was not really interested in taxonomy at all. For him, taxonomic questions were proxies for evolutionary ones. That is: his goal was not to classify organisms, but rather investigate the relationships between organisms in order to learn about evolution.

Carl Woese was not the first person to map evolutionary relationships using sequences of biological molecules, but he was the first to do it with nucleic acid sequences. Some previous efforts used the protein cytochrome C, for example. But cytochrome C is not universal (i.e. some organisms don’t have it) and many organisms have different versions of cytochrome C, making it technically difficult to isolate the protein from diverse samples. Woese and Fox focused instead on the ribosome,3 the central machine of the central dogma of molecular biology, which is, conveniently, 50-70% RNA by mass (BioNumbers ID 109646). Because of their more generic chemistry, nucleic acid sequences are much simpler to isolate and sequence than proteins. The ribosome reads out the genetic code from messenger RNA and converts the information encoded there into protein chain, proteins being the main effectors of the present-day biological world. Proteins carry out nearly every bit of work, coordination and information processing that cells and organisms perform. It’s almost inconceivable to imagine life (as we know it) before proteins. And indeed all life that we know of has proteins and all life we know of also has ribosomes with which to make those proteins.

Woese’s big idea, which he expressed in a letter to Francis Crick in 1969, was to use the RNA component of the ribosome – the so-called 16S RNA of the small subunit of the ribosome.4 This “one weird trick” allowed Woese and Fox to isolate and sequence a universal molecule and thereby discover the archaea. So Woese’s ideological revolution anticipated a technological one, where every gene is sequenced and every homology is mapped. Today bacterial genomes and cancer mutations are sequenced routinely, both subjected to analyses whose roots trace back to that seminal 1977 paper by Woese and Fox. The archaea are now firmly established as a distinct kingdom of life, so much so that it is beginning to seem clear that the eukaryotes (e.g. us, dogs, plants, yeasts, amoebas) arose in significant parts archaeal genomes. In other words, we are now beginning to trace our own evolutionary roots to these little bugs with no apparent subdivisions.

I was born into a scientific universe where Carl Woese and George Fox’s methods and results are presumed. There is no escaping gene sequencing, homology comparison and evolutionary analysis in present-day molecular biology. Realizing all that, I contacted George Fox over e-mail (Carl sadly died in 2012) and asked if we could have a conversation about his work, about Carl Woese and about the history of life on Earth. George, who is now a professor of biology and biochemistry at the University of Houston, graciously agreed.

Interview

Avi Flamholz: I've been reading about you and about Carl a bunch. I was curious how you ended up studying the phylogeny and the origin of prokaryotic life from originally studying Chemical Engineering.

George Fox: (laughs) You mean how I got to Carl's lab.

AF: Yeah, exactly.

GF: See. That's kind of a complicated story. I was a chemical engineer and I liked chemical engineering a whole lot as an undergraduate, so I made the decision to go to graduate school and learn more about it. I found out in graduate school it was a very different science. What they mostly did in graduate school was tensors and vectors and solving differential equations ... it was basically early 20th century physics and it wasn't to my liking. It wasn’t very much fun so I started looking at other things. Eventually I took a couple courses in the biology department and liked it.

Also there was an influence – a Professor by the name of Dan Jackson in the Civil Engineering department. He was the token biologist. If you look at environmental or civil engineering programs, there was always one biologist, because they used biological treatment methods for wastewater and stuff. He was their biologist. I took a class with him and he took us on field trips. Anyway I got interested in biology.

When I was nearly done and looking for a postdoc, I decided by then that I wanted to get into theoretical biology, mathematical biology. So I was looking around for a postdoc and some curious things happened. I was also on the verge of getting married. I had my wife-to-be agree to go ... you know in those days you had to type everything. Basically you had to send a whole bunch of letters looking for a postdoc or some kind of position. So she went to the library and she dug up a whole bunch of addresses and we sent 100 letters or so. I got one response from somebody who wanted me to do something in gastroenterology (laughs).

I didn't know what that was. So I did the obvious thing. I went to the library to find out. After 10 seconds I knew it was of no interest to me. As I was leaving the library there was a display of new books and I picked up this book ... I think it was called “Frontiers of Biology.” I took out this book and there was this article by Carl on his reciprocating ratchet model5 for how the ribosome works and I loved it. That was exactly the kind of stuff I wanted to do. And there was a secondary aspect. His name was Carl Woese. I was actually in an engineering fraternity called Theta Tau. It turned out that one of the 3 founding members [of the local chapter] was Carl Woese. This connection was interesting.

So anyway I wrote to Woese. In the meantime I escaped for the summer to the Marine Biological Laboratory to take their microbial ecology course, which was taught by Ralph Wolfe and a couple of other people whose names momentarily escape me. I was in Woods Hole in the summer and I get a letter from Carl and it turned out that Carl had a summer home in Martha's Vineyard. He told me he was coming over sometime to the mainland and maybe we could meet and talk about it. So he did and we met and he offered me a position in his lab as a postdoc. As it turned out the other Carl Woese was actually his father - he was an engineer. And he [the younger Carl Woese] was enticed by the fact that I was from Syracuse and he was from Syracuse.

So he offered me the postdoc and I got married and we got a car and we drove to Illinois and went over to the building where his lab was and went up to the 3rd floor ... and there he was. I was sort of hoping that I actually had a job because my wife would have been really mad! You see there was no exchange of letters so it was just this offer and then I never heard from him again until I got there. He never even told me what the salary would be. You can see we were both kind of strange people. Everything started from there. Next question!

AF: Was it evident to you at the time that you were going to be working on the history of early life? What did you think you were getting into when you got into Carl's lab?

GF: I guess I'm not sure. I'd done a lot of reading, you know, reading books like Lawrence and Oppenheimer, Watson, Crick, just all the books about science. Which was a strange thing because Woese had lived with these guys when the genetic code was being worked out. I read all their articles. I read the books as part of my transition from chemical engineering. So even though I didn't actually know these people personally, I knew their ideas. Carl knew them personally and he knew their ideas so we had a lot in common. But I would say when I went there I was just interested in the ratchet and theoretical things and I didn't really think about the big questions yet.

AF: Recently I read this article by Dan Koshland from ‘58 where he's positing, I think, roughly the ratchet model of the ribosome.6 It was amazing to me, because you realize looking backwards how little was known at various points in time, right?7 The stuff we take for granted now was not known, a lot of it pretty recently. So could you tell me: what was the state of affairs in ’73-’74 when you went to Illinois? What was known about the central dogma?8

GF: Probably not so much. I mean for the ribosome there was basically an old fashioned model called the A-site P-site model, which was actually proposed by Watson. That was the standard model. It was more of an order events type model and Carl's ratchet idea was really a molecular model without any molecular data. And it was very enticing, a very beautiful model. Unfortunately not technically correct in the end, but it was the way he would think about these things. I guess very little was known about the details about how the ribosome worked. We knew that there were certain proteins involved. There was a lot of interest in trying to figure out which was protein was the peptidyl transferase9 and all that. Of course it turns out ultimately it's the RNA, so that made those studies not very useful (laughs).

AF: So on that background of not knowing that the RNA was the active component: how is it that Carl came around to choosing the 16S [ribosomal RNA]10 as the substrate on which you would do all this evolutionary analysis?

GF: He actually started with the 5S.11 There were several factors, I suppose. He knew that the ribosome was important. He had been primarily interested in the genetic code and after the genetic code was solved he transferred his interest to "where did it come from" – what's the evolutionary origin of the genetic code. He felt that that related somehow to the ribosome. There's a rather famous letter, which he wrote to Francis Crick.

In that letter then he basically explains where he came from and what he wants to do. From a purely experimental point of view, the ribosomal RNAs are ideal in several ways. Number one, you could find them. When you run a gel you get a16S RNA or an 18S12 RNA it's basically the same molecule – every organism has it. If you are working with something like cytochrome C, which early on was common, you have different versions of cytochrome C so they are not strictly homologous. They're not universal in their distribution. So the ribosomal RNA had this wonderful property of being universal and also has this wonderful property of being easy to get at. You can isolate it very easily [and unambiguously] from any organism.

AF: Sort of a confluence of technological limitations and luck?

GF: well yeah. There was also the fact that he acquired the electrophoresis equipment from Sol Spiegelman and that was a kind of backhanded luck because he was friends with Spiegelman. Carl obviously wasn't happy Spiegelman was leaving, but he inherited all this equipment, which he would have never acquired on his own. He wasn't the big lab science guy. Really, we were bioinformatic people in the early days. I typed all the IBM cards.

AF: I read that was you. Did you also implement the average linkage algorithm13 and all that stuff?

GF: Yeah, sure!

AF: My dad studied physics and computer science in the ’70s. He's tells me "you have it so easy. I used to write punch card systems in 32 K."

GF: Absolutely.

AF: It's amazing. I was thinking about how much time it must have taken. Every step of what you guys did for that paper was years, I assume. I can do the alignment and I can use a much more sophisticated algorithm and do the whole thing in a day.

GF: Well, we didn't have to do the alignment because we didn't have the sequence! (laughs)

You know the big genome centers have a pipeline for what they’re doing. I imagine you have a pipeline for your group. We basically had a pipeline. We had several people who worked in the lab, grew the cells, labeled the cells with radioactivity. We had a technician who did certain experimental things that were routine. Once we analyzed the sequences, we went back and did more experiments to clarify the sequences. In the end we had all the sequences, so we put them into computer files and catalogued them, computerized them and figured out how to compare them. It was, in a way, very non-creative. We produced those sequences relatively quickly and got faster and faster over time.14 The real nightmare was the eukaryotic sequences.

AF: Those were nightmarish for what reason?

GF: First of all, the RNA apparently was labelled somewhere else. But the big problem was modified nucleotides. The eukaryotic RNAs have a lot of methylations and pseudo-Us as it turns out and those were a nightmare because we're comparing15 mobility. We're breaking the RNA into pieces and we're breaking the pieces into pieces and we're ultimately trying to come up with a characteristic mobility on electrophoresis so that we know what it is. The modified nucleotides totally screw that up because they don't run in the same way and we didn't really know enough about them to know what we were dealing with, so that was a mess. Reading through those modifications, because there were a lot of them, made it very difficult.

If you look at the original paper, we never showed the data. The data was never published. They let us get away with that. We proved that the sequences were not the same as opposed to figuring out what they were. There are certain sequences that are universal - archaeal sequences that are universal [among the archaea] and bacterial sequences that are universal [among the bacteria]. We were able to show that none of those sequences were present in the eukaryotic samples.

AF: Is that sort of like saying that your error bars on the comparisons to the eukaryotic samples were much larger?

GF: What we're basically saying is that the quality of the bacterial and archaeal catalogs was very high and the quality of the eukaryotic catalogs was not at the same standard.

AF: So your paper with Carl in 1977 is widely credited as being the discovery of the Archaea. But it seems that it's also responsible for some other discoveries. The most obvious being that you can use sequence information from DNA or RNA to establish evolutionary relationships. There was another statement in the secondary literature that it was the first proof that all life was related. I was surprised that there was debate about that, given that all life has ribosomes to begin with.

GF: (laughs) I don't remember that point…

I think at the time there was general feeling that the genetic code, to the extent it had been examined, was universal and therefore all life would be related. Luckily, they had actually looked at the genetic code in a halophile.16 So they actually had examined the genetic code in a quite wide variety of organisms. I think that the 16S would support that view very strongly, but I don't know it was a big controversy at the time.

AF: It did seem that the paper itself was received with an enormous amount of controversy. And again, that was surprising from a look backwards. Do you have a sense of what that controversy was really about?

GF: Well, sometimes I don’t (laughs). To be honest, the controversy was strange. Well, I'll give you a simple example. There was an organism called Sporosarcina ureae. And the reason it was called Sporosarcina ureae was that it had the shape of a sarcina and made spores a bacillus. So there was this controversy among the small number of people who studies this organism: is it related to Bacillus or is it related to Sarcina? So we sequenced the 16S RNA of this guy and found that it was more closely related to bacillus! OK? So the microbiologists were very happy: we had this new tool that answered their long-standing questions. Another similar example was bacterial photosynthesis. There was a controversy about whether or not all the various photosynthetic bacteria17 were related and all in one group or whether they were in multiple [groups]. So again we had catalogs of the organisms from the various groups and we determined that they were not all in one group. That was accepted and people loved it.

So there were multiple cases where traditional microbiologists had controversy, you know: theory A and theory B and we were able say which was correct. So in that sense they loved the work. But on the other hand there was controversy by an entirely different group of people. I tended to not try to get too involved in all that. I had my problems: finishing up with Carl and starting a lab, trying to teach courses and lectures and write grant proposals. There were things to do other than to argue with some random person. So Carl did most of the screaming back and forth. But yeah, it struck me as a little strange that there was real controversy, because to me it was obvious.

AF: The controversy that was described in the papers you sent me had two forms. The first one was this story with Ernst Mayr, which is very strange, because it seems like Mayr is basically saying that the purpose of this whole endeavor is to produce a taxonomy that is useful for research. In other words, to be a Dewey Decimal System of molecular biology. Carl appears to be saying, "that's not my purpose at all, my purpose is to figure out how life evolved."

GF: Right.

AF: That's an axiomatic difference, which I'm happy to just leave on the table.

GF: I think that's true. Carl wanted to know the relationships about the bacteria, he thought that was going to tell him something more about the origin of life. And it does, ultimately, because it allows you to create a last universal common ancestor (LUCA). In the Crick letter he discussed that. He discussed the reasons for looking at bacterial relationships as being clues to early evolution. He wasn't interested in revising Bergey's18 manual taxonomy (laughs).

The controversy with Lake was about something entirely different. The Lake controversy was about the structure of the tree and how you make the tree, more or less. He was basically proposing that the halophiles were related to the bacillus19 rather than to the methanogens.20 We didn't like that very much. But you know I haven't kept up with it because, like I said before, the controversies didn't really matter to me. I mean, you do the work and you publish it and if people don't like it then they can prove it's wrong! (laughs) I mean that's their job as scientists: to prove its wrong, not scream about it.

AF: So the second controversy, which struck me as a lot more material, was the role or horizontal gene transfer in obscuring evolutionary relationships. My impression is that it has largely been resolved. Is my impression right?

GF: I mean, I think it just gets resolved by the accumulation of the data, right? Clearly there's endosymbiosis,21 clearly there's lateral transfer and clearly the overall tree based on 16S is a good sound tree of ribosome relationship. But remember the tree is based basically on ribosomal RNA, so that speaks to the translation system. It's more the evolutionary history of the translation system than the organisms. So parts of the organism may have come from somewhere else by lateral transfer.

But there's the residual question of whether or not there's any part of the eukaryotic genome that's actually descended from an independent protobacterium or "progenote."22 So is there a core eukaryotic line? Most people don't think so. But Hyman Hartmann still does. AF: Just to be clear, the concern here is these characteristically eukaryotic genes that don't appear to be present in archaea or bacteria.

GF: Right! Because they might have existed before the symbiosis that led to the cell. Obviously part of the eukaryote is bacterial in terms of the mitochondria and chloroplast, right? And part of it is archaea, particularly in terms of the translation apparatus. [The real question is:] is there stuff in there that is unique to the eukaryotic line of descent? And not recently evolved, right? There are certainly things that are unique to the eukaryotes but may just be extensions. For example, the snoRNAs,23 you know that big system in the archaea as well as the eukaryotes and they're a lot more important in the eukaryotes. But they are basically in the archaea. Histones [proteins packaging DNA in eukaryotes] are found in a much more primitive way in other organisms. So there are obviously things in the eukaryotes that are unique, but the real question is: are they aboriginal? So they relate back to the pre-LUCA time period. My impression is that the general opinion is that there is no such relationship.

AF: That they are not aboriginal.

GF: Not everybody agrees on it, but the new tree [models] suggests that the eukaryotes came from archaea. Lake would say now that there's only two lines of descent.

AF: Depending on your perspective. I mean if you start earlier there's only one line of descent.

GF: (laughs) Right, ultimately there's only one. I think any time you have three it has to boil down to two and one.

AF: How robust do you think our observations thus far are to new evidence? I don't know if you saw this paper on the lokiarchaeota that came out in Nature this year.24 They're doing this metagenomic survey of a deep ocean vent, I think off the coast of Greenland, where they found an archeon that has a lot of the characteristic eukaryotic proteins in it. And then they've reconstructed these genomes through some sort of black magic that I don't yet understand (laughs).

GF: There's a computer program somewhere that does it. You trust those computer guys, right? Having been one...25

AF: Do you think that this discovery of the lokiarcheota really solidly places the eukaryotes inside the archaea? Is there still room for debate there?

GF: Apparently there's still room for debate. I don't worry about it. I guess I'm supposed to read all those papers, but we work on the evolution of the ribosome and we have a little side project where we're looking at the microorganisms at the space station and basically, it doesn't impact what I'm doing (laughs). And you know you have to learn, as you probably have already worked out in your career, you have certain capabilities available to you and you use those capabilities. I don't have any ability to go out and collect samples from the bottom of the ocean and sequence the genomes. But I have no problem with the archaea being related to the eukaryotes26 (laughs). Certainly there's a lot of evidence that the translation apparatus is, so is it more extensive than that? Possibly.

AF: In the past couple of years there have been some discoveries of whole new clades of eubacteria. One example is these melainabacteria27 that are sister clade28 to the cyanobacteria but not photosynthetic.

GF: Cool!

AF: Do you think it's possible that our picture29 could be overturned by new evidence?

GF: Is there a fourth Domain? A lot of people want to make viruses a fourth thing. I don't like that because I think the viruses don't count because they don't have the central dogma. They are basically pieces that fall off in various times in history. Then there's this story of those super big viruses that get people real excited because they have genomes as large as or larger than some of the bacterial genomes. But in the end, they have to have the host. They are not independent living organisms. They don't have the translation machinery. So I don't know: are you going to find a fourth independent lineage while we have a hard time keeping three? I mean, it's not all that bad - the archaea have been moved up from three to two! (laughs)

AF: So it gets simplified. Okay so you're focused on the ribosome, so I'll focus on the ribosome so that we can get into the weeds a little bit. So LUCA obviously had to have a ribosome because everything that we know has a ribosome. Can you talk a little bit about how we could get insight into the like pre-LUCA state from the evolution of the ribosome?

GF: As you already said, the ribosome existed in the time of LUCA. A lot of the LUCA definitions may be 20030 genes. You'll see at least half of them, maybe more than half of them, have something to do with the ribosome. So the history of the ribosome predates LUCA. The tree brings you to LUCA, but you want to go earlier. The ribosome is speaking about what happened in that past period. And we, using various tricks, have been able to come up with a pretty good idea of the early history of the ribosome. I've been campaigning for this more recently. There’s really two aspects: the development of the peptidyl transferase center, the chemistry. And the other aspect is, once you have the chemistry, is modernizing it. Making it more and more useful, more and more effective. So that's subsequent evolution of the ribosome itself. And so we have two things: trying to understand where the original chemistry comes from and what happened after you have the ability to do it.

AF: I read some of your recent papers. One thing that confused me about them is: what good is it to make peptides bonds when there is protein no yet? What's the purpose of evolving a peptidyl transferase center like a pre-protein31 world?

GF: Evolution doesn't have a purpose. You should read some of Doolitle’s ideas. Basically, what you're saying is the things that last longer will stay around (laughs).

AF: But they have to contribute to fitness, right?

GF: There's a competition, but its not necessarily a strictly Darwinian competition. Once something exists, it tries to continue to exist.32 So one of the advantages of the early peptide machine might be that it makes peptides that stabilize the machine [i.e. the ribosome].

AF: Is there some basic biochemical reason to believe that peptides would stabilize an RNA machine?

GF: Just that peptides stabilize the ribosome.33

AF: But these are different in character, these peptides you're positing, right? They're not templated, so they are probably random sequence.

GF: They neutralize charge.34 That's a big deal. Nucleic acid systems - charge needs to be neutralized.35 The first guy to do that, if you believe Loren Williams, is iron. Ferrous iron [Fe2+], replaced in the modern world with magnesium [Mg2+]. When you look at the core of the ribosome, what we think are the oldest parts, there’s no proteins but there are important metal interactions. Magnesium interactions stabilize the ribosome. Loren found these core magnesium binding sites that stabilize the PTC36 portion of the ribosome.

But you can see a variety of things, like for example the tRNA37 probably started out as the CCA.38 CCA, by the way, is not [usually] coded! It's added to tRNA afterwards. There's a theory of the origin of the genetic code, they call the "operational code." The idea is that the amino acids bind to these very small RNAs and the relationship39 develops from there. So the tRNA gets bigger over time. Eventually there's a messenger RNA and once you have a messenger RNA now you can start having a genetic code. But early on, you're just going to make random peptides. It's hard to see how that works (laughs).

I love to argue that the ribosome terminates the RNA world,40 but the real problem with the RNA world is where does the RNA come from to begin with! There’s these people who are claiming there might be an alternative genetic material. Because RNA gave rise to DNA so why doesn't some kind of nucleic acid X give rise to RNA? But then you still have this little question of how this earlier genetic material got synthesized. When you get down that far, the top down approach begins to die.

You know about that? There's what they call top-down and bottom-up. The bottom up is the Miller-Urey41 type of thing where hopefully you produce the compounds that are going to give rise to life. Top-down is you compare everything about life, trying to figure out what the earliest things were. In my world, the earliest thing I have is the PTC. But you know, it's still 200 residues. And I can get it down to 100 by saying it's a gene duplication, but how did you make 100? Actually, I don't think it's that hard if you can make a 25mer, because the real beauty of nucleic acids in the origin of life world is that by hybridization42 and ligation43 you can rapidly create complexity. And in the pre-LUCA progenote world, the type of evolution that's going is a little bit different, right? Time is important, complexity is important.

AF: What do you mean here by complexity?

GF: You're increasing complexity over time. The minimal living system is much more complex than a Miller Urey experiment. So how do you increase the complexity? It's always interesting to go to an origin of life meeting because you have all these people proclaiming they've solved the problem. And then you look at it closely and you find out that they all have a different definition of the problem. There's a bunch of people who look at it as a chemical phenomenon – that you would have a chemical system which never reaches an equilibrium, but it's continuously changing and therefore evolving and ultimately increasing in complexity.

AF: There are two things that are pretty fundamental here that I'm curious about. The first one is: how did heredity work? You're talking about a world in which genes as we know them don't exist. If you start from no life and you go to life you have to have some state in which you don't have nucleic acids.

GF: Our idea of the progenote was that you would have entities was that you would have entities that could do part of the central dogma but not all of it. Then those entities could potentially share their existence – there could be populations where some of the things were making primitive peptides, some of the things were trying to code or do transcriptions, but that the central dogma didn't really exist. One of the things that I think that Carl and I agreed on which maybe differs from a lot of the origin of life community was the definition of life. Our definition was that you had to have the central dogma, that you weren't living until you had the central dogma. So the origin of life problem is "how do you create the central dogma." The ribosome is a key element of that.

There's a whole lot of other people who look at the problem of the origin of life as this chemistry question: if I can show a self-replicating system that continues that to increase in complexity, I've solved the problem and I don't care about the genetic code. Our vision was that you would have entities that are complex but they can't do the whole job so maybe they would share things. So there would lateral transfer of information between the various entities and it would be a non-Darwinian type of evolution.

AF: You say non-Darwinian because it's mostly not inheritance by descent?

GF: Yeah, it's inheritance by acquisition.

AF: So you're imagining loose boundaries between organisms? A global metagenome44?

GF: Yeah, yeah. You're talking about populations of entities that are not identical. When you hit LUCA, you finally have that level of complexity that you can begin to have Darwinian evolution. You lock these other guys out.

AF: Is the “locking out” an essential part of the story?

GF: It has to happen at some point. At some point you become a living organism that has an RNA genome, a transcription system and a translation system. Once you have that, you can accurately replicate your genes and that is a lot better than picking up garbage off the street.45 So you're going to cease that heavy lateral transfer and focus more on maintaining what you have. I think Carl wrote about this kind of thing in his later years, where lateral transfer would decrease when Darwinian evolution appears.

AF: But it doesn't necessarily say that we shouldn't see these progenote organisms today. Which we don't, right? So what you are positing is that they were simply out-competed by LUCA-like organisms.

GF: Well, you know, if they're organic, the new guys would eat them (laughs).

AF: I think your definition of life as "having the central dogma" is very crystalizing for me. If I can ask you to indulge in pure speculation for a little bit, could you diagram what you think the most likely trajectory for the evolution of the central dogma is?

GF: No! (laughs) ...

I mean this has been published somewhere, but you can take a small RNA and you can have a structure like the tRNA and you could have the same RNA have two structures basically depending what strand it is.46 So you could be using one strand as your coding region and the other strand as a functional RNA that’s involved in decoding. You can then make RNAs that are providing the information and also serving in the primitive translation machinery. Kind of a crazy idea. RNA is interesting, but we don't really believe in DNA. You know DNA comes later. There's fairly good agreement on that

AF: Could you give us a brief snippet about why there's good agreement on DNA coming later? GF: Multiple reasons. One of which is that most of the DNA replication machinery's not in LUCA, just very primitively. You can do it with RNA. One of the things that DNA allows you to do is have bigger genomes. RNA genomes typically top out around 10,000 residues, and if you have 6-7 chromosomes at 10k residues you're probably at the level of complexity of a living system. DNA solves the problem of getting larger.

One of the other reasons for liking the RNA over DNA is simply the pathways - DNA is made by the rather bizarre pathway of simply ribonucleotide reductase, an enzyme the converts RNA into DNA. There's no independent pathway to make DNA. The nucleotide co-enzymes are all based on ribose systems. There's a number of good reasons for not liking DNA as the early genetic material. Plus RNA does all kinds of interesting reactions.

AF: Is that just by virtue of the single strandedness, or is there something else about the deoxyribonucleic acid?

GF: There’s no 2' OH [in DNA]. So a lot of chemistry that is easy with RNA is hard with DNA. On the other hand, DNA is more stable, so it makes a better genetic material. An early organism's going to have a small genome or maybe shared genomes in populations. They're going to profit from having a larger genome.

AF: So LUCA is about 3 billion years ago, if I understood correctly. That gives us a billion-ish years47 to do all this work of creating life as we know it.

GF: Unless it came from somewhere else! (laughs) but that doesn't really solve the problem. It would take it a billion years to get here!

AF: Is that billion years plausible? Could we make this entire suite of central dogma in a billion years given this pre-LUCA state that you've described? It's such a speculative question.

GF: Life is here. It came from somewhere. And we know that organisms like cyanobacteria lived 3 billion years ago. At least they leave fossils that look like cyanobacteria. So you have to build this complexity very quickly. But look at it more speculatively. Look at what's happened in our lifetimes in terms of computers or music. This stuff evolves like crazy. Very very fast. You mentioned early on in this conversation how what we did in the 1970s seems so simplistic compared to what we are doing now. It's hard to believe; how did you not know? But we didn't know! (laughs) There are still a lot of things we don't know, but the speed at which evolution can work is very fast. If the right compounds are there and the right conditions, interesting things are going to happen.

AF: On the one hand you can say – hey, you needed to make all this fundamental biochemical machinery in this relatively short period of time, and then it seems like stuff afterwards evolved relatively slowly. On the other hand, you can look at the post-LUCA state and say – well, you needed to make complex multi-cellular organisms with cooperative operation, incredible scaling capacity and organization of a complex genome. It's sort of a choice which of those you think is more complicated to evolve.

GF: One thing to realize is the evolution builds on the past. Once you've established the ribosome, you don't typically go off and try to invent an alternative. You just build on the thing.

Now I have ribosomes. They allow me to make proteins, right? I have a genetic code. Now it might be nice if I had 30 amino acids in the genetic code instead of 20,48 but once it got established you don't mess with it anymore. You mess with something at a different level of complexity. So evolution's going on at different levels and in our modern world it's not actually the biology that's driving evolution, evolution is occurring in the social and the technological realms, right? You know, it's just like the internal combustion engine. We now know there's better ways of doing it, but don't easily switch over to that. The resistance is very high. The way the ribosome got started might not have been a very good idea, but we've built on it and kept adding to it. Even though it was [probably] a lousy system to start with, we've refined it over time until we finally got it so good that we got LUCA.

I think you have to include in your thinking maybe what evolution is happening in the world that we live in. It's real evolution. Remember that classic story with the Betamax? The Betamax was a better system. It turned out that the VCR won out because the companies made it available, right?49 Whereas the Betamax, they tried to make it exclusively a Sony product and they lost. If you want to preserve something historically, you'd better write it on paper. That seems to be the only thing that doesn't change.

1 Woese and Fox, “Phylogenetic structure of the prokaryotic domain: the primary kingdoms.” PNAS 1977. http://www.ncbi.nlm.nih.gov/pubmed/270744

2 Pace, Sapp & Goldenfeld PNAS 2012. http://www.pnas.org/content/109/4/1011

3 David Goodsell has an excellent primer on the structure of the ribosome as part of the Molecule of the Month series. http://pdb101.rcsb.org/motm/121

4 The 16S rRNA is about 1500 nucleotides long, BioNumbers ID 102495. http://bionumbers.hms.harvard.edu/bionumber.aspx?&id=102495&ver=4&trm=16S%20length

5 Carl was famously antagonistic to the classic (and basically correct) A-site P-site model of ribosome function where transfer RNAs move from the A-site to the P-site after a peptidyl transferase reaction in order to set up the next reaction. Carl found this model implausible because this model appears to require that tRNAs evolved at the same time as the ribosome. According to Harry Noller (a famous ribosome researcher) Carl’s antagonism continued well after the A-site and P-site were observed in ribosome crystal structures (Noller, RNA Biology 2014). Carl instead proposed a model that is, to my eye, is much more delicate and complicated than the A-site P-site model and, as such, very hard to summarize (Woese, Nature 1970). The most important feature of the ratchet model is that tRNAs don’t need to translocate during translation – Carl felt this allowed us to imagine much simpler tRNAs and resolve the evolutionary question. A draft of Carl’s 1970 paper is publicly available the CSHL Sydney Brenner archive: http://libgallery.cshl.edu/items/show/74148

6 Koshland “Application of a Theory of Enzyme Specificity to Protein Synthesis.” PNAS 1958

7 Now that I understand Carl’s model, I realize that Dan Koshland was positing no such thing. Nonetheless, the lecture in question by Koshland starts from a premise that is today very surprising: he asks whether the ribosome is a generic machine working for all proteins, or if there might be a single ribosome for translating each gene. This question would never occur to a recently-trained biologist.

8 The central dogma of molecular biology describes the flow of information from a DNA genome to messenger RNA via transcription and then to protein via translation.

9 i.e. the business end of the ribosome, the part that joins amino acids together by forming peptide bonds.

10 An RNA that is part of the small subunit of the bacterial ribosome.

11 The 5S is also an RNA component of the ribosome. It is part of the large ribosomal subunit, but it is actually much smaller than the 16S. It turned out that there was not enough information content in a ~120 nucleotide sequence to reconstruct the evolutionary history of the ribosome, so Carl ultimately transitioned to working with the 16S.

12 The eukaryotic homolog of the 16S rRNA.

13 The algorithm that was used to construct the phylogenetic tree based on the similarity of 16S sequences.

14 George notes that the real fun was figuring out what the results meant.

15 The modifications change the charge of the nucleotides and therefore change there mobility in the electrophoresis experiment.

16 i.e. an archaeon. Halophile means salt-loving, but most halophiles are from the Archaeal lineage.

17 There are many kinds of photosynthetic bacteria that differ pretty substantially in the biochemistry of their photosynthetic machinery.

18 Bergey’s Manual of Systematic Biology is a classic taxonomy of prokaryotes. Burgey’s is primarily focused on the (eu)bacteria and designed to aid in the identification of bacterial species rather than comment on their evolutionary relationship to one another.

19 A genus of rod-shaped, Gram-positive eubacteria. Many members of the genus can produce oval-shaped spores that can survive very harsh conditions.

20 i.e. archaea.

21 The engulfment of a free-living bacterium that generated the mitochondrion and chloroplast, which are thought of as obligate symbiotes with the host cell. We can tell that both these organelles were free living because they have their own genomes which contain genes that are homologous to free-living bacteria.

22 Woese’s term for a hypothetical pre-LUCA organism.

23 Small nucleolar RNAs (snoRNAs) are located in the nucleolus of eukaryotic cells. Nucleoli are granular structures made of both RNA and protein, where the proteins in question bind (attach) to the snoRNAs. So the RNA sequence is responsible for organizing these granule structures in the nucleus, which are the site of ribosome synthesis. Primitive versions of these snoRNAs are found in the archaea (Bachellerie et al., Biochimie 2002).

24 Spang et al., Nature 2015.

25 I used to program computers professionally. George and I discussed this prior to the interview.

26 As this was the central claim of his most famous paper.

27 Di Rienzi et al., eLife 2013.

28 i.e. groups that are each others closest non-parent relatives according to the phylogenetic tree, similar to the relationship of a person with their sister.

29 i.e. the tripartite tree

30 What George means here is lists of genes inferred to be in LUCA on the basis of their existence in most or all present-day organisms. There is disagreement about the specific composition of the list, but the ribosome and many ribosome-related proteins definitely make the cut.

31 Because we don’t think you can make proteins without first having the peptidyl transferase center of the ribosome.

32 Similar to Dawkin’s selfish gene theory.

33 The ribosome is made of both RNA and protein, with proteins being polypeptide chains.

34 Peptides composed of positively charge amino acids with like lysine or arginine could serve this function.

35 The sugar-phosphate backbone of DNA and RNA has substantial negative charge. In fact, the net negative charge of DNA and RNA is the basis of the various biochemical methods for extracting and measuring the length of nucleic acids.

36 Peptidyl transferase center – the portion of the ribosome that makese new peptide bonds, enabling the production of polypeptide chains, i.e. proteins.

37 Transfer RNAS (tRNAs) are loaded with single amino acids to catalyze their addition to a growing polypeptide chain. The implication here is that the tRNA was originally not coded in a genome, but rather a short sequence of RNA that could bind amino acids (like he posited for the ribosomal RNA). David Goodsell has a wonderful description of tRNAs and their structures here: http://pdb101.rcsb.org/motm/15.

38 A sequence added to the end of tRNAs in archaea, eukaryotes and some bacteria that is required for the tRNA to function.

39 i.e. the relationship between the tRNA and its cognate amino acid.

40 The hypothetical world pre-protein where RNA was both the genetic material and the effector molecule of life.

41 Trying to understand how complex organic molecules can arise from the abiotic primordial soup.

42 Base-pairing, as in DNA, is a natural part of nucleic acid systems due to the fact that nucleotides can hydrogen-bond with each other in a variety of ways.

43 A chemical reaction that concatenates two strings of nucleic acids.

44 A genome that is not held solely by a single organism. George is envisioning a world where genes, however they are encoded, are massively shared. In such a world, the pace of evolution might be much faster.

45 Grabbing genes from someone else.

46 Due to base-complementarity of RNA or DNA, there is a “sense” strand and an “anti-sense” strand that is perfectly complementary to the sense strand.

47 The earth is estimated to be 4.6 billion years old and was probably too hot to sustain life for 500 million years.

48 The present-day genetic code contains about 20 amino acids, though some organisms have one or two extras. Some interesting exceptions that prove the rule are selenocysteine and pyrrolysine.

49 If you watch the above-linked video, you’ll see that this story has a little wrinkle to it. Even though Betamax machines were better engineered and the video quality was higher, VHS machines were cheaper and could hold enough content (2 hrs) to record a movie. So VHS did have some technical features that were superior to Betamax, and these features fit the market. But George’s point here stands, which is that, in a perfect world, we might have considered switching to the Betamax once it got cheaper and could record 2 hrs, but the market never reconsidered that decision.