Selectivity is a linchpin of chemical synthesis — if a synthetic reaction is not selective, it cannot give a good yield of the desired product, and will require tedious purification processes. Chemists have therefore long sought ways of predicting the selectivity of chemical reactions. Computational models can be constructed, but their development is laborious, and they are usually specific to a particular reaction type. Writing in Nature, Reid and Sigman1 now show that a selectivity model can be built in a semi-automated way and generalized over a range of reactions.

Read the paper: Holistic prediction of enantioselectivity in asymmetric catalysis

Chemical selectivity comes in many flavours, but it is especially difficult to achieve enantioselectivity, which depends on a property called chirality. Molecules are said to be chiral if they come as two mirror-image forms — enantiomers — that have many identical properties, but can differ in certain important aspects. A good analogy is with hands: a person’s right and left hands have the same length, colour and mass, but only one fits into a right-handed glove.

Many biological targets for pharmaceuticals look like right-handed gloves to molecules — only one enantiomer of a molecule will fit into them. For this reason, pharmaceuticals should be synthesized as one enantiomer only; the other form might even be toxic. Asymmetrical catalysts are used to influence synthetic chemical reactions to form only one enantiomer of the product. Nature’s asymmetrical catalysts are enzymes, which produce single enantiomers of biomolecules efficiently and with exquisite selectivity. Enzymes can also be used as catalysts for synthetic chemistry, but they generally have a limited range of substrates and can produce only one of the two possible enantiomers of a product.

Modern synthetic catalysts challenge the efficiency of enzymes, and can often be made as mirror-image forms that each produce a different enantiomer of a desired molecule. To support the development of new catalysts, chemists use models to understand and predict the enantioselectivity of catalytic reactions2,3. These range in complexity from simple models of the catalyst drawn on paper, onto which a molecular model of the substrate is superimposed to estimate the best fit, to quantum-mechanical calculations that describe an entire reaction path.

A direct predecessor of Reid and Sigman’s modelling work is a computational approach called quantitative structure–selectivity relationships (QSSR), in which a correlation is sought between the properties of reaction components and the observed selectivity. The relevant properties can be either determined experimentally or calculated, and can include such things as molecular-bond lengths, vibrational frequencies and atomic charges. Using a semi-automated statistical approach (multiple linear regression), these properties are used to construct a model that outputs one numeric value for each reaction system being studied3. A result of zero means that there is no selectivity — both enantiomers are produced in equal amounts. A high value indicates a very selective system, and the sign of the numerical output (positive or negative) indicates which enantiomer is mostly produced. Opposite enantiomers of a catalyst produce opposite enantiomers of the product, and this should also be reflected in QSSR models of synthetic catalysts; this requirement is not essential for models of enzymes, however, because only one enantiomeric form of any enzyme exists in nature.

QSSR models are normally limited to a narrow set of substrates and catalysts, because the assumptions built into the machine-learning procedures are invalidated by large deviations from the molecular structures used to train the model. Reid and Sigman have taken on the challenge of making a general QSSR model, starting from an earlier model reported by Reid and colleagues4.

Inspired by enzyme models, Reid and Sigman ignored the sign conventions usually adhered to in models of synthetic catalysis — that is, they produced a model that predicts the magnitude of enantioselectivity for a group of catalytic reactions (Fig. 1), but only for one enantiomer of the catalyst. Switching the catalyst to its mirror image will therefore not switch the sign of the output in their model, and the model cannot predict which enantiomer is produced as the major isomer. However, the major enantiomer can be predicted from the preceding work4. Within this framework, the authors demonstrated that one of the components of the modelled reactions could be varied to an unprecedented degree, without affecting the high accuracy of the predictions.

Figure 1 | Model reactions. Reid and Sigman1 report a computational model that predicts the outcome of reactions when a wide range of nucleophilic molecules react with imines in the presence of a catalyst, accounting for factors such as molecular structure and solvent. More specifically, the model reports the magnitude of the enantioselectivity of the reactions — a measure of the ratio of the two mirror-image isomers (enantiomers) of the product formed in the reaction. Spheres represent a variety of chemical groups; bonds shown in bold or as solid wedges project above the plane of the page; broken wedges project below the plane of the page. Nu represents a range of groups or molecular structures.

How can one model achieve such a wide range of accurate predictions? Part of the explanation is probably that all the reactions share a similar mechanism: a planar substrate (an imine molecule; Fig. 1) is ‘gripped’ from one side by the chiral catalyst4, so that any reaction has to occur on the other side. The third reaction component (a nucleophile), can therefore be varied substantially in the model. But the main reason is that the authors made a huge effort to produce a comprehensive training set of 367 individual reactions, each of which required multiple calculations to describe all the components, including the variability in shape (the conformations) of each component. It is highly encouraging to see that holistic reaction models can be produced by using such a wide training set.

Where next? A dream for reactivity modellers is to build an ultimate tool that accurately predicts the products of any reaction from the reaction components, thereby allowing computational screening of new reactions. Modellers have a long way to go to achieve this, but Reid and Sigman have shown that they can accurately predict outcomes for groups of related reactions, rather than having to model one type of reaction at a time. Other machine-learning methods are being tested on even bigger data sets5.

The broadening of reaction scope demonstrated in the current work will encourage the search for more-general models, and might eventually enable models that predict the outcomes of reactions very different from those used for training. For now, making such predictions is still the domain of humans, but synthetic chemists will increasingly rely on theoretical tools to guide their work. I, for one, look forward to a future in which the tedious trial and error of synthetic chemistry is removed, and in which chemists can cut to the chase by carrying out only successful reactions.