What role does the appearance of new genes, versus simple changes in old ones, play in evolution? There are two reasons why this question has recently become important.

The first involves a scientific controversy. Some researchers—the most prominent being evo-devotee Sean Carroll—maintain that most important evolutionary change, at least in body form, involves changes in regulatory sequences rather than simple changes in genes themselves, or the appearance of new genes. This question hasn’t yet been answered, since we don’t know a great deal about those mutations that have been important in creating new body plans.

The second controversy is religious. Some advocates of intelligent design (ID)—most notably Michael Behe in a recent paper—have implied not only that evolved new genes or new genetic “elements” (e.g., regulatory sequences) aren’t important in evolution, but that they play almost no role at all, especially compared to mutations that simply inactivate genes or make small changes, like single nucleotide substitutions, in existing genes. This is based on the religiously-motivated “theory” of ID, which maintains that new genetic information cannot arise by natural selection, but must installed in our genome by a magic poof from Jebus.

I’ve criticized Behe’s conclusions, which are based on laboratory studies of bacteria and viruses that virtually eliminated the possibility of seeing new genes arise, but I don’t want to reiterate my arguments here. What I want to do is point out a new paper by some Chicago colleagues that suggests that new genes, at least in the genus Drosophila (fruit fly), not only arise pretty quickly, but also diverge very quickly to become essential parts of the genome.

The paper, by Sidi Chen, Yong Zhang, and my friend Manyuan Long, appears in this week’s Science: “New genes in Drosophila quickly become essential.” It’s a clever piece of work. What the authors did was compare whole-genome sequences between various species of Drosophila (there are now many of these) to see how often new genes appeared in one lineage: the lineage that diverged from the ancestors of D. willistoni to become D. melanogaster. The divergence between these two lineages is 35 million years, but by comparing the genomes of other species that branched off these two branches, they could estimate how often new genes arise over the entire period from 3 million to 35 million years ago.

What do they mean by “new genes”? These are genes in D. melanogaster that aren’t found in D. willistoni, but have arisen since their divergence by several processes—most often the duplication of an ancestral gene or its RNA followed by extensive genetic divergence, so that the gene acquires a brand new function. (This process accounts for about 90% of the new genes. Some genes, however, are so different between the species that how they arose is a mystery.) These “new genes,” then, would qualify as what Behe calls “gain-of-FCT” adaptive mutations (“FCT” = functional coded element): the kind of mutations that Behe did not see arising in short-term lab experiments on bacteria and viruses.

Chen et al. found that a surprisingly large number of genes had arisen in the D. melanogaster lineage over this 35-myr period. Here’s a summary of their results:

The authors identified 566 new genes that arose over this period. That’s about 4% of the total genes in the D. melanogaster genome. And that’s quite a few given that the divergence is only 35 myr. The genus Drosophila itself (including the scaptomyzids) diverged from its sister group about 63 million years ago, so we can estimate that, in the genus as a whole, at least 7% of the genome comprises brand new genes.

The authors were able to take a sample of these genes (195 of them) and knockdown their transcripts using novel RNAi technology (this involves inserting transposable genetic elements in those genes and then using those elements to kill the genes). They found that about 30% of these new genes are essential for viability—that is, the fly dies if it has no active copies. This proportion didn’t vary depending on how long ago the “new” gene had arisen. Nor did it differ much from the proportion of “old” genes (those present in both lineages) that are essential for viability, which is about 35%. It seems, then, that even if these genes arise as duplicates from pre-existing genes, they quickly assume new functions that make the fly unable to survive without them.

The “new function” conclusion is supported by two other pieces of data. First, the average difference in DNA sequence between the “new” genes in the D. melanogaster lineage and their parental copies (that is, the genes from which they originated, usually by duplication) is 47.3%. That’s a big difference—a change in nearly every other nucleotide. Second, there are new ways to determine what the new genes do: by estimating which proteins in the genome each new gene’s protein product interacts with. Chen et al. found that many of the products of new genes interact with proteins completely different from the ancestral genes. This implies that the new genes have evolved completely different functions. And, as theory suggests, that’s the way these genes become essential: at first they do the same thing as their ancestral genes (they’re duplicates, after all), but as they diverge they assume new functions (usually impelled by natural selection) that fit them into new developmental pathways. In this way a gene that is at first “gratuitious” can become essential. It’s nice that we can actually see this happening with protein-protein interaction data.

In further support of the above scenario for the evolution of new genetic information, the authors found that in young and new “essential” genes, there was a strong signature of natural selection having acted, as suggested by the high rates of DNA substitution. As the “new” essential genes become older, and assume new functions, these rates slow down. This again supports the theory of how new genes originate: when they’re formed by duplication, they are quickly eliminated from the genome (see below) unless they diverge quickly to do something new. Thus the duplicates that do survive are usually those that have diverged quickly. Once the new function has been assumed, and the gene is essential, selection then acts to preserve its new function by eliminating new mutations (“purifying selection”).

These results, which show that new genetic information (“FCT”s) arises quickly, don’t imply that every new gene duplication becomes a brand-new gene with a new function. That’s far from the case. We don’t know the figure in Drosophila, but in the human lineage it’s estimated that only about 5% of new duplications diverge to become new genes that do something novel. The rest are inactivated, becoming dead “pseudogenes” that don’t do anything. In Drosophila these are quickly removed from the genome, but in our own lineage many of them linger, so we can estimate the proportion of duplicated genes that don’t go on to do something new.

Nevertheless, genes duplicate frequently enough that they can provide sufficient raw material for genetic novelty. Estimates of how often a given gene duplicates in evolution run about one duplication event per 100 million gene copies. That seems low, but remember that there are thousands of genes in the genome, and, in many species (including Drosophila and now ours), there are hundreds of millions of individuals. That means that, in the species, there are many genes that duplicate each generation. Even if only a few percent of these survive inactivation, that’s a lot of raw material for evolutionary change.

The presence of frequent gene duplications is supported by an independent study: Emerson et al. (2008) found that in only fifteen lines of D. melanogaster from nature there were several hundred duplicate genes segregating as polymorphisms (that is, some individuals had one copy of a gene, some had two or more). They estimated that 2% of the genome was tied up in this copy-number variation. Clearly, there are a lot of duplicate genes variants floating around in nature.

The data of Chen et al., then, show that new genetic information can arise quickly, at least on an evolutionary timescale, and that the new genes rapidly assume new functions. (Note: I am using Behe’s characterization of “new genetic information” as involving only new FCTs. I don’t agree with this, since new genetic information can also arise when a single gene copy changes sufficiently to do something new.)

Although this doesn’t answer the question of what proportion of new evolutionary traits involve changes in gene sequence versus changes in gene regulation, it does show that a substantial part of the genome in one group of eukaryotes arises by the evolution of new FCTs that become involved in new developmental networks. In other words, Behe’s conclusion from short-term lab studies of bacteria and viruses doesn’t apply to this well-studied group of organisms—and probably not to other eukaryotes, either. All the evidence tells us that a rapid and important way to create new genetic information is through the duplication of genes and then their divergence by natural selection.

Poems are made by fools like me

But only selection makes an FCT

Now ID advocates like Behe could—and do—suggest that maybe the successfully duplicated-and-diverged genes didn’t arise by natural selection, but appeared by the instantaneous intervention of the designer (aka God/Jebus). But that idea is nixed by at least two observations. The first is the appearance in many groups of dead, nonfunctional pseudogenes that were unsuccessful duplicates. If the Great Designer made gene duplications to create genetic novelty, he surely failed in the majority of cases, and left his failures sitting around in the genome.

The second is the correlation between the age of a new gene and the type of selection acting on it. If a Great Designer created these duplicates de novo to have a new function—presumably because natural selection couldn’t take a gene to a new function by gradual stepwise evolution—they would show instantaneous changes of DNA sequence that looked like selection, and then an instantaneous cessation of that selection right after the gene got its newly created function. But that’s not what we see. What we see is not instantaneous but gradual change: the younger a gene is (as estimated by the position on the evolutionary tree where it arose), the more rapid natural selection acts. That directional selection continues to act as the gene gets older, but then slows down and finally becomes purifying selection, so that new DNA changes are eliminated. This pattern is precisely what’s predicted if duplicates arise by accident and then quickly change by selection to assume new functions.

I suppose Behe and his minions will find a way to explain these two patterns by intelligent design, but that’s because ID theory isn’t science: there is no conceivable observation that can prove it wrong. Every bit of data, no matter what it is, can always be fitted into the ID scheme, especially since its advocates allow a little bit of Darwinian evolution and posit an unpredictable and unknowable Designer. But let us not tarnish the nice results of Chen et al. by using them to cast aspersions on ID. They are a valuable contribution to the real science of evolutionary biology, showing how fast new genetic information can arise by gene duplication.

h/t: Manyuan Long for his patient explanations.

______

Chen, S., E. Zhang, and M. Long. 2010. New genes in Drosophila quickly become essential. Science 330:1682-1685.

Emerson, J. J., M. Cardoso-Moreira, J. O. Borevitz, and M. Long. 2008. Natural selection shapes genome-wide patterns of copy-number polymorphism in Drosophila melanogaster. Science 320:1629-1631.