Insilico Medicine, a drug discovery startup located in Johns Hopkins University, has introduced MOSES (Molecular Sets), a platform which can be used to compare model accuracy in molecular generation. MOSES provides a standardized benchmarking dataset, a set of open-sourced models with unified implementation, and evaluation metrics. MOSES aims to facilitate the sharing and comparison of new models to accelerate AI development in drug discovery.

The MOSES benchmark contains a unified molecules dataset and results from various models, enabling AI researchers to compare their model’s results with the benchmark and evaluate different performance metrics. MOSES includes a Character-level recurrent neural network (CharRNN), VAE, AAE, etc.; and introduces tools and metrics to calculate model accuracy. Model performance can be reflected by the diversity and quality of generated molecules.

The MOSES dataset is based on ZINC Clean Leads, a collection containing 4,591,276 molecules filtered by molecular weight, number of rotatable bonds, and XlogP. Insilico Medicine, Neuromation and Alán Aspuru-Guzik’s molecular prediction results form the cornerstone of MOSES. Insilico hopes to involve more models in MOSES in the future and have opened all of the dataset and code to evaluate performance.

The emergence of an academic standard will drive the development of molecular generation technology. Researchers from different institutions can use MOSES to measure their model results and adjust their algorithms accordingly. ImageNet is the famous benchmark dataset for image recognition which has boosted recognition accuracy from 71.8 percent to 97.3 percent in eight years. It’s hoped MOSES can do the same thing.

As a startup concentrating on extending longevity, Insilico Medicine is flexing its AI muscles. Founder Alex Zhavoronkov believes that as more players enter the molecular generation arena, unified open-source benchmarks will be increasingly important for the community.

The paper Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models and MOSES code are at GitHub.