Scientists are supposed to be systematic thinkers. But the names assigned to human genes don't follow any system; they are an odd jumble of cryptic abbreviations, forced acronyms, and weird neologisms. Some gene names are informative, like GPD1, a sensible abbreviation for the functional term “glyceraldehyde-3-phosphate dehydrogenase.” Other genes, like p53, have cryptic names that tell you nothing at all about its function. And then there are genes like "sonic hedgehog," which, according to the scientific paper where it was first described, was named "after the Sega computer game cartoon character."

How did human gene names become such a confusing mess?

SONIC HEDGEHOG WAS NAMED by Robert Riddle, working with Cliff Tabin at Harvard University in the early 1990s. Riddle was tackling one of the major problems of molecular biology of that era: identifying and isolating individual genes. This was a decade before the complete human genome became available, after which gene-finding became largely automated, a matter for computer algorithms that parse large DNA databases. Until then, isolating genes was an arduous, experimental task, and the first scientist to isolate a particular gene earned the privilege of naming it.

Riddle was trying to find the chicken version of a gene named “hedgehog.” That there was already a gene named hedgehog is thanks to the quirky culture of fruit fly genetics. Hedgehog, a critical gene for embryo development, had been discovered in fruit flies as part of the Nobel Prize-winning work of Christiane Nüsslein-Volhard and Eric Weischaus. Nüsslein-Volhard and Weischaus were searching for new genes by studying mutant flies that were defective in embryonic development. A mutant fly that doesn't develop properly must have a mutation in some important development gene, so, by locating the mutation, the researchers could identify the gene.

Scientists try to be systematic thinkers, but the history of science—like the history of any other part of human society—is anything but orderly.

Genes discovered this way are now commonly named after the effect of the mutation. Like a patient taking a Rorschach test, fruit fly geneticists tend to come up with gene names by free-associating words with the deformities of the mutant embryos. When Nüsslein-Volhard and Weischaus discovered hedgehog, they also discovered genes they named "patch" and "gooseberry." This fondness for odd gene names can make the molecular biology of fruit flies sound very strange: Hedgehog and patch interact to activate "cubitus interruptus," which in turn controls the expression of the gene "decapentaplegic."

Because hedgehog is so important in developing fly embryos, Riddle and Tabin were looking for the corresponding gene in vertebrates, using chick embryos because they were easy to work with. They found their gene, and named it "sonic hedgehog" to distinguish it from the fruit fly version. Other researchers, working with mouse embryos, discovered that mammals, in addition to sonic hedgehog, have two other copies of the gene, which they named "desert hedgehog" and "indian hedgehog." Since human and mouse genes are so similar, they're given the same names—and that's how we get to a place where humans have a critically important gene named for a video game character.

THE STORY BEHIND SONIC hedgehog isn't uncommon—important genes are discovered in many different ways, and in many different organisms. The sometimes-strange names given to human genes reflect the culture and methods of the different communities of scientists who discovered them. Fruit fly geneticists use descriptive names like hedgehog. Yeast molecular biologists use three-letter abbreviations followed by a number, such as CDC42. Biochemists discover new proteins by their molecular size or their enzymatic activity, and so the corresponding genes end up with names like p53 or alcohol dehydrogenase 1B. Medical geneticists will use a two-letter abbreviation of a disease name to describe a mutation, and so the RB gene is named for retinoblastoma. And some gene names are both fun and accurate: DICER is the molecular equivalent of a butcher knife.

Sometimes researchers deliberately push the boundaries of what's acceptable. In the case of the Pokemon gene, whose name is a forced acronym for "POZ and Krüppel erythroid myeloid ontogenic factor," the researchers were threatened with a trademark infringement suit by Pokémon USA, who didn't want their game brand tarnished "by associating Pokémon with cancer."

Now that we're in the post-genomic era, the frontier for genes names is closing. The human genome has been mapped out, and so discovering individual genes is no longer a major activity of geneticists. Most human genes have been identified, cataloged, and assigned standard, machine-readable identifiers. A nomenclature committee has established guidelines. Thanks to their history, human gene names are still a mess, but the disorder has been tamed.

With multiple complete human genome sequences in hand, researchers now face a new issue: What else in our DNA deserves a name? Only a small part of our DNA consists of genes, so what about the many other functional and medically important parts? And, especially, what about those parts that exist in some people, but not others? So-called "structural variation," large segments of DNA that are duplicated or deleted in different people, will add, delete, or merge genes to create new entities that don't exist in the standard catalog. For clarity in research papers, these need to be named, and so the official gene nomenclature committee recently published examples of how it's done.

Scientists try to be systematic thinkers, but the history of science—like the history of any other part of human society—is anything but orderly. Gene names reflect the quirks, interests, and ideas of biologists over the last century. They are informative, cryptic, confusing, and sometimes funny. Most importantly, they reflect the process of discovery, which, for genes, began when the science of biology was very different from today. It's easy to impose order on things that are known, but impossible to catalog the unknown.