Early-stage drug discovery requires a constant supply of new molecules, to be fed into High Throughput Screening machines. To increase this supply, data scientists are preparing AI systems that can generate virtual molecules on-demand. These in silico simulations should generate drug compounds with a higher hit rate in high throughput screening, and a lower attrition rate later in the drug discovery pipeline.

In this post, I briefly explain how some of these systems work, and also how they don’t work. I present 2 challenges they are facing:

multitasking between different objectives

generating chemically diverse molecules.

How some generative systems work

These systems have two types of components. The first type is generative: one model randomly generates virtual molecules, i.e. chemical formulas like this:

The second type of components is predictive: other models compute chemical properties of generated virtual molecules, properties they would have if they were produced for real. Example of such properties include: activity, toxicity, and solubility in water. In practice, we want to combine different properties into one molecule, and therefore, we have one generator and many predictors, one for each desired property.

Taken together, this generator and these predictors form a reinforcement learning system. Predictors provide the environment. They reward the generator, which acts as an agent.

At each iteration, the generator updates its probability distribution on chemical formulas, in a way that tries to improve virtual molecules. The hope is that after enough iterations, virtual molecules will combine all the desired properties together.

Challenge 1: juggling with a multitude of objectives

However, in practice, it’s hard to find a reward function that brings good incentives to the generator. Reinforcement learning agents are pretty lazy: they quickly find out the easiest property, focus on it, and just ignore others. On the other hand, if they are forced to consider all the different objectives together, then they just get confused by this multitasking. Multi-objective reinforcement learning is a complex topic that is barely scratched by current systems.

This is a big problem for drug discovery, because a molecule must satisfy 100% of a large number of properties. That’s the only way to get through clinical trials, and meet FDA standards. For example, the drug must be active on the disease, minimize toxicity, side-effects, and so on.

To receive FDA approval, a drug must satisfy a multitude of objectives

Challenge 2: reproducing natural chemical diversity

Another challenge is that currently, generated molecules lack the chemical diversity found in nature. They look simplistic and repetitive. They seem more like artificially flavored candies:

COCCNC(=O)CCc1ccccc1

CCOC(=O)CCCCc1ccccc1CCN(C)CCCCc1ccccc1C

COCCCC(=O)CC(C)CCCCc1ccccc1C

CCCC(=O)CSCCC(=O)CCCCC(=O)OC

CCC(=O)COCCCCCCC(=O)CCCSC

Artificial strawberries: simplistic and repetitive

On the other hand, natural molecules look more like organic fruits, flourishing and diverse:

CS(=O)(=O)CCCOc1ccccc1Br

CN(Cc1ccccc1F)C[C@@H]1C(=O)C(C)©OC1(C)C

CC[C@@H]1CN(Cc2nc3ccccc3n2CC)C[C@H]©O1

CSCC[C@H](NC(=O)c1ccccc1Cl)C(=O)Nc1ccc(O)c(Cl)c1

O=C(Nc1ccccc1)N1CCC[C@H]1c1nc2ccccc2s1

Organic strawberries: flourishing and diverse

A team at Harvard is giving a shot at this problem. They, quite ironically, called their model ORGANIC (Objective-Reinforced Generative Adversarial Network for Inverse-design Chemistry), maybe to sound as if it mimicks nature.

A (modification of) their approach is the following: in reinforcement learning, instead of using predictors to reward the generator, the idea is to use discriminators, which are slightly different. A discriminator takes as input two classes of data: generated molecules on one side, and the natural molecules having the desired property on the other side. It rewards the generator for the perceived similarity between generated and natural molecules.

Discriminators and the generator are trained alternatively. They learn from each other, and there should be a balance between them. This training method is often called adversarial training, although here, the relationship between discriminators and the generator is more cooperative than conflictual. Adversarial training is a method dating back to the 1990’s, and it has recently been revived with Generative Adversarial Networks.

However, in practice, it does not work very well: I computed the chemical variance of generated molecules in my paper, and results are awful. Generated molecules still lag far behind natural molecules.

Besides the multi-objective RL problem that I mentioned (challenge 1), what happens is that the generator can’t follow the discriminator for very long. ORGANIC has a perfect discriminator problem.

There are many possible ways to solve this issue: taking a larger generator, a larger training set, a low-data discriminator, a more modern loss function (like CramerGAN) and using one-sided label smoothing. So that’s not the end of the story.

Conclusion

If you are interested in those challenges as a researcher (amateur or pro), or as a sponsor, you can learn more in my paper, and discuss on the Startcrowd chat. If you use my paper in your own academic papers, don’t forget to cite it. For example:

Benhenda, M. 2017. ChemGAN challenge for drug discovery: can AI reproduce natural chemical diversity? arXiv preprint arXiv:1708.08227.