The default assumption was that it was Ohno’s mechanism taken to an extreme — divergence beyond recognition. Orphan gene sequences could have evolved so quickly, or for such a long time, that they lost their family’s resemblance.

Other explanations were possible but, according to McLysaght, they seemed less likely. Orphan genes could enter a lineage through the horizontal transfer of whole or partial genes from bacteria or viruses, for example, but few of the identified orphans in complex organisms seemed as if they could have come from bacteria. Theoretically, a gene could also be orphaned if all of its homologs in other lineages were coincidentally lost through evolution — but that too seemed improbable to be a routine explanation. And then there was the de novo possibility, but that came with its own hurdles.

Still, researchers kept finding orphan genes that looked convincingly as if they had evolved de novo. In 2006 and 2007, for example, the geneticist David Begun at the University of California, Davis, identified genes in the testes of fruit flies that had evolved from nongenic sequences. Gradually, the question shifted from whether de novo genes existed to how common they were.

During the past decade, researchers have vigorously argued about the relative importance of de novo gene creation and divergence beyond recognition. But there was still no easy way to look at orphan genes and determine how they arose. “The field was hamstrung by that, in a sense, because if you can’t really know how many are real [de novo genes], and what’s the significance of this phenomenon, then you’re a bit stuck,” McLysaght said.

Location, Location, Location

To bring some clarity to that debate, McLysaght and her former postdoctoral fellow Nikolaos Vakirlis (now at the Alexander Fleming Biomedical Sciences Research Center in Greece), along with their collaborator Anne-Ruxandra Carvunis at the University of Pittsburgh, set out to quantify what proportion of the orphan genes in flies, yeast and humans could be explained by sequence divergence.

They took a novel approach to that analysis, as they described in a paper in eLife in February. Scientists usually check whether genes are homologous by comparing their nucleotide sequences (or the amino acid sequences of the proteins they encode). McLysaght’s team looked instead at each gene’s position relative to its neighbors — a property that geneticists call the gene’s synteny.

McLysaght offered this analogy to explain their approach: Suppose you start with an ordered deck of playing cards and lightly shuffle them. The first two cards off the top of the deck are the 9 and 10 of clubs; you keep the third card face down; the fourth and fifth cards are the queen and king of clubs. You could guess with reasonable confidence that the hidden card is the jack of clubs because the odds are better that the complete sequence survived than that the middle card alone was disturbed.

Similarly, the order of neighboring genes on a chromosome is mostly conserved through evolution. Pieces of chromosomes get resorted significantly, but within those shuffled blocks, the arrangement of genes tends to stay intact. The researchers made a conservative assumption that if a gene’s neighbors appear in the same order in another species, then the gene is likely to correspond to whatever is sandwiched between them in the other species as well — even if the sequences don’t match.

Using the synteny method, the researchers estimated that at most a third of orphan genes in flies, yeast and humans could be explained by divergence beyond recognition. “The rest must be explained by other ways, and the de novo origin is the best way to explain those,” McLysaght said.

Rates of Divergence

Weisman and her Harvard advisers Andrew Murray and Sean Eddy used a slightly different method to address the same problem in work they described recently in a preprint on the biorxiv.org server and have submitted to a journal for peer review. “The whole question here is, if I can’t detect a homolog outside of some organism or some group, is that because the homolog is there and I can’t detect it, or because the homolog isn’t there?” Weisman said.

To find out, she looked at a group of related yeast species and Drosophila fruit fly species and estimated the rates at which mutations accumulated within their gene families. She could then determine statistically whether the homolog for a gene in one species would even be detectable in distantly related species. That allowed her to identify cases where “your result that the gene looks like an orphan is totally explainable just through the gene evolving normally and your search software not being omniscient,” she explained.

Weisman estimated that somewhere between 55% and 73% percent of the orphan genes in these yeasts — a majority — were explained by divergence; that figure is higher than McLysaght’s synteny approach suggested. Nevertheless, to Weisman, it’s reassuring that her method and McLysaght’s fundamentally different one converged on the conclusion “that there is some decidedly nontrivial number of these genes that probably are just due to divergence.” She added, “Even if it’s 30% or 50% or 80%, either way you slice it, it’s clearly a problem for people who want to study [de novo genes] by studying orphan genes.”

Li Zhao, a geneticist at Rockefeller University who was not involved with either Weisman’s or McLysaght’s work, agrees that both papers reach roughly the same conclusion about the origins of orphan genes, although one emphasizes the abundance of de novo genes and the other the abundance of ones from divergence. “One paper is talking about this glass being half full, and the other is describing it as half empty,” she said.

Given that mixture of origins for orphan genes, Zhao continued to say, a good way to study the de novo ones might be to focus on the very young ones. If a de novo gene has originated recently, it should still be possible to identify the corresponding nongenic sequence in other species from which it evolved, she explained. That would serve as proof that the orphan gene is truly de novo.

How Function Emerges

A good illustration of this is a 2019 study of young de novo genes in wild Asian rice (Oryza) led by Manyuan Long, a geneticist at the University of Chicago who has pioneered research into novel genes since the early 1990s. Long and his colleagues identified about 175 genes that originated de novo within the last 3.4 million years; they could tell that these genes were de novo because corresponding nongenic sequences were still recognizable in closely related species. These de novo genes appeared to be biologically active — that is, they were transcribed into RNA and translated into peptide chains, and most of them showed signs of being shaped by natural selection.