U.S. researchers have designed a new computer algorithm that can model and catalogue the entire set of carbon-containing molecules, and created a map of the so-called ‘small-molecule universe’ (SMU).

The SMU is a set of all synthetically feasible organic molecules of 500 Daltons molecular weight or less. It has more than 10^60 chemical structures.

“Many of the world’s problems have molecular solutions in this chemical space, whether it’s a cure for disease or a new material to capture sunlight,” explained Dr David Beratan of Duke University, senior author of a paper describing the algorithm and map published in the Journal of the American Chemical Society. “But the small-molecule universe is astronomical in size. When we search it for new molecular solutions, we are lost. We don’t know which way to look.”

The map tells scientists where the unexplored regions of the chemical space are and how to build structures to get there.

“The map helps chemists because they do not yet have the tools, time or money to synthesize all 10^60 compounds in the small-molecule universe. Synthetic chemists can only make a few hundred or a few thousand molecules at a time, so they have to carefully choose which compounds to build,” Dr Beratan said.

“The scientists already have a digital library describing about a billion molecules found in the small-molecule universe, and they have synthesized about 100 million compounds over the course of human history. But these molecules are similar in structure and come from the same regions of the small-molecule universe.”

“It’s the unexplored regions that could hold molecular solutions to some of the world’s most vexing challenges.”

To add diversity and explore new regions to the chemical space, the team developed a computer algorithm that built a virtual library of 9 million molecules with compounds representing every region of the small-molecule universe.

“The idea was to start with a simple molecule and make random changes, so you add a carbon, change a double bond to a single bond, add a nitrogen. By doing that over and over again, you can get to any molecule you can think of,” said lead author Dr Aaron Virshup, also from Duke University.

Dr Virshup programed the new algorithm to make small, random chemical changes to the structure of benzene and then to catalogue the new molecules it created based on where they fit into the map of the small-molecule universe. “The challenge came in identifying which new chemical compounds chemists could actually create in a lab,” he said.

Dr Virshup sent his early drafts of the algorithm’s newly constructed molecules to synthetic chemists who scribbled on them in red ink to show whether they were synthetically unstable or unrealistic. He then turned the criticisms into rules the algorithm had to follow so it would not make those types of compounds again.

“The rules kept us from getting lost in the chemical space,” he said.

After ten iterations, the algorithm finally produced 9 million synthesizable molecules representing every region of the SMU, and it produced a map showing the regions of the chemical space where scientists have not yet synthesized any compounds.

“With the map, we can tell chemists, if you can synthesize a new molecule in this region of space, you have made a new type of compound,” Dr Virshup said. “It’s an intellectual property issue. If you’re in the blank spaces on our small molecule map, you’re guaranteed to make something that isn’t patented yet.”

______

Bibliographic information: Virshup AM et al. Stochastic voyages into uncharted chemical space produce a representative library of all possible drug-like compounds. J. Am. Chem. Soc., published online April 2, 2013; doi: 10.1021/ja401184g