Scientists computed a zoo of millions of alternate genetic polymer molecular structures, giving context for why biology encodes information how it does, and providing potential leads for new drugs and a guide to searches for extraterrestrial biology.

Biology encodes information in DNA and RNA, which are complex molecules finely tuned to their functions. But are they the only way to store hereditary molecular information? Some scientists believe life as we know it could not have existed before there were nucleic acids, thus understanding how they came to exist on the primitive Earth is a fundamental goal of basic research. The central role of nucleic acids in biological information flow also makes them key targets for pharmaceutical research, and synthetic molecules mimicking nucleic acids form the basis of many treatments for viral diseases, including HIV.

Other nucleic acid-like polymers are known, yet much remains unknown regarding possible alternatives for hereditary information storage. Using sophisticated computational methods, scientists from the Earth-Life Science Institute (ELSI) at the Tokyo Institute of Technology, the German Aerospace Center (DLR) and Emory University explored the “chemical neighborhood” of nucleic acid analogs. Surprisingly, they found well over a million variants, suggesting a vast unexplored universe of chemistry relevant to pharmacology, biochemistry, and efforts to understand the origins of life. The molecules revealed by this study could be further modified to gives hundreds of millions of potential pharmaceutical drug leads.

Nucleic acids were first identified in the 19th century, but their composition, biological role, and function were not understood by scientists until the 20th century. The discovery of DNA’s double-helical structure by Watson and Crick in 1953 revealed a simple explanation for how biology and evolution function. All living things on Earth store information in DNA, which consists of two polymer strands wrapped around each other like a caduceus, with each strand being the complement of the other. When the strands are pulled apart, copying the complement on either template results in two copies of the original. The DNA polymer itself is composed of a sequence of “letters”, the bases adenine (A), guanine (G), cytosine (C) and thymine (T), and living organisms have evolved ways to make sure during DNA copying that the appropriate sequence of letters is almost always reproduced. The sequence of bases is copied into RNA by proteins, which then is read into a protein sequence. The proteins themselves then enable a wonderland of finely-tuned chemical processes which make life possible.

Small errors occasionally occur during DNA copying, and others are sometimes introduced by environmental mutagens. These small errors are the fodder for natural selection: some of these errors result in sequences that produce fitter organisms, though most have little effect, and many even prove lethal. The ability of new sequences to allow their hosts to better survive is the “ratchet” which allows biology to almost magically adapt to the constantly changing challenges the environment provides. This is the underlying reason for the kaleidoscope of biological forms we see around us, from humble bacteria to tigers, the information stored in nucleic acids allows for “memory” in biology. But are DNA and RNA the only way to store this information? Or are they perhaps just the best way, discovered only after millions of years of evolutionary tinkering?

“There are two kinds of nucleic acids in biology, and maybe 20 or 30 effective nucleic acid-binding nucleic acid analogs. We wanted to know if there is one more to be found or even a million more. The answer is, there seem to be many, many more than was expected,” says professor Jim Cleaves of ELSI.

Though biologists don’t consider them organisms, viruses also use nucleic acids to store their heritable information, though some viruses use a slight variant on DNA, RNA, as their molecular storage system. RNA differs from DNA in the presence of a single atom substitution, but overall RNA plays by very similar molecular rules as DNA. The remarkable thing is, among the incredible variety of organisms on Earth, these two molecules are essentially the only ones biology uses.

Biologists and chemists have long wondered why this should be. Are these the only molecules that could perform this function? If not, are they perhaps the best, that is to say, other molecules could play this role, and perhaps biology tried them out during evolution?

The central importance of nucleic acids in biology has also long made them drug targets for chemists. If a drug can inhibit the ability of an organism or virus to pass its knowledge of how to be infectious on to offspring, it effectively kills the organisms or virus. Mucking up the heredity of an organism or virus is a great way to knock it dead. Fortunately for chemists, and all of us, the cellular machinery which manages nucleic acid copying in each organism is slightly different, and in viruses often very different.

Organisms with large genomes, like humans, need to be very careful about copying their hereditary information and thus are very selective about not using the wrong precursors when copying their nucleic acids. Conversely, viruses, which generally have much smaller genomes, are much more tolerant of using similar, but slightly different molecules to copy themselves. This means chemicals that are similar to the building blocks of nucleic acids, known as nucleotides, can sometimes impair the biochemistry of one organism worse than another. Most of the important anti-viral drugs used today are nucleotide (or nucleoside, which are molecule differing by the removal of a phosphate group) analogs, including those used to treat HIV, herpes and viral hepatitis. Many important cancer drugs are also nucleotide or nucleoside analogs, as cancer cells sometimes have mutations that make them copy nucleic acids in unusual ways.

“Trying to understand the nature of heredity, and how else it might be embodied, is just about the most basic research one can do, but it also has some really important practical applications,” says co-author Chris Butch, formerly of ELSI and now a professor at Nanjing University.

Since most scientists believe the basis of biology is heritable information, without which natural selection would be impossible, evolutionary scientists studying the origins of life have also focused on ways of making DNA or RNA from simple chemicals that might have occurred spontaneously on primitive Earth. Once nucleic acids existed, many problems in the origins of life and early evolution would make sense. Most scientists think RNA evolved before DNA, and for subtle chemical reasons which make DNA much more stable than RNA, DNA became life’s hard disk. However, research in the 1960s soon split the theoretical origins field in two: those who saw RNA as the simple “Occam’s Razor” answer to the origins-of-biology problem and those who saw the many kinks in the armor of RNA’s abiological synthesis. RNA is still a complicated molecule, and it is possible structurally simpler molecules could have served in its place before it arose.

Co-author Dr. Jay Goodwin, a chemist with Emory University says “It is truly exciting to consider the potential for alternate genetic systems, based on these analogous nucleosides – that these might possibly have emerged and evolved in different environments, perhaps even on other planets or moons within our solar system. These alternate genetic systems might expand our conception of biology’s ‘central dogma’ into new evolutionary directions, in response and robust to increasingly challenging environments here on Earth.”

Examining all of these basic questions, which molecule came first, what is unique about RNA and DNA, all at once by physically making molecules in the laboratory, is difficult. On the other hand, computing molecules before making them could potentially save chemists a lot of time. “We were surprised by the outcome of this computation,” says co-author Dr. Markus Meringer, “it would be very difficult to estimate a priori that there are more than a million nucleic-acid like scaffolds. Now we know, and we can start looking into testing some of these in the lab.”

“It is absolutely fascinating to think that by using modern computational techniques we might stumble upon new drugs when searching for alternative molecules to DNA and RNA that can store hereditary information. It is cross-disciplinary studies such as this that make science challenging and fun yet impactful,” says co-author Dr. Pieter Burger, also of Emory University.

Reference: “One Among Millions: The Chemical Space of Nucleic Acid-Like Molecules” by Henderson James Cleaves II, Christopher Butch, Pieter Buys Burger, Jay Goodwin and Markus Meringer, 9 September 2019, Journal of Chemical Information and Modeling.