Open Review is a platform for reviewing papers submitted at some AI conferences. Anyone can throw his opinion about any paper. Some reviews are very crusty, a kind of Rotten Tomatoes for machine learning research.

I came across a paper about my favourite topic, generative models in drug discovery. The paper is called: “Exploring Deep Recurrent Models with Reinforcement Learning for Molecule Design”. I couldn’t resist to post my review.

The paper is written by Nathan Brown and his team (Daniel Neil, Marwin Segler, Laura Guasch, Mohamed Ahmed, Dean Plumbley, Matthew Sellwood), from BenevolentAI, a startup in AI for drug discovery. According to their website:

BenevolentAI is the world leader in the development and application of artificial intelligence for scientific innovation.

I don’t know if it’s true, but anyway. Paper abstract:

We investigate a variety of RL algorithms for molecular generation and define new benchmarks (to be released as an OpenAI Gym), finding PPO and a hill-climbing MLE algorithm work best.

The paper was posted on the 27 October 2017 on Open Review, but apparently, BenevolentAI promoted it on their marketing channels only recently. I unearthed this paper through this intriguing tweet by Nathan Brown, on 23 February 2018:

The hashtag #RealTimeChem is probably British humour. The paper is 4 months old!

My review

Your paper makes an interesting curation of data, models, and tasks. It can be useful for a proper benchmark. No discussion of previous measures of diversity in the literature, like Guimaraes et al (30 May 2017), or Benhenda (28 August 2017). There are even more diversity metrics appearing after your submission, in the recent Benhenda et al. (8 February 2018) here:

Your definition of diversity is extremely rudimentary: it just measures the proportion of unique molecules. Why is it the best metric?

Is your definition of diversity good enough for the chemical space?

3. Your paper proposes 19 new tasks, but completely ignored previous tasks in the literature, like in Olivecrona et al. (25 April 2017), Dieb et al. (20 July 2017), Sanchez-Lengeling et al. (17 August 2017). If you are an advocate of reproducibility, why these omissions?

Conversely, after your paper, the literature totally ignored your platform, like Popova et al. (29 November 2017), Ertl et al. (20 December 2017) and many others. See a list here.

After 4 months passed, how do you explain the persisting lack of adoption of your platform by the community ?

In my case, I just didn’t hear about it at that time. Lack of marketing. Thanks Nathan Brown for tweeting.

4. Your approach is too narrow, you remain within reinforcement learning models. There are important works with autoencoders, like Gomez-Bombarelli et al. (2016), Kadurin et al. (July 2017) and many others, and they deserve to be benchmarked too. Is the OpenAI gym platform restricted to Reinforcement Learning ?

Is OpenAI gym the most suitable environment?

The ongoing DiversityNet molecule benchmark is currently using the Texygen platform (appeared on the 6th February 2018), which is more inclusive for non-RL algorithms. See the comparison Texygen vs. OpenAI gym.

However, Texygen is not generalized yet to the very few non-SMILES generative models in the literature. That’s something to do.

5. Does multi objective optimization always succeed by taking ‘any arbitrary balanced weighting’ of objectives ? Is it always so easy? A general discussion, with different objectives, would have been welcome. Otherwise, the uninformed reader could imagine that it’s a piece of cake to do Multi-objective reinforcement learning.

Multi-objective optimization is not always easy

In conclusion, I don’t think that the ML community should tackle this challenge the way you presented it in your paper.

I am certainly biased, but I think the current DiversityNet approach is more inclusive: for the models, for the tasks, and for the people performing them.