Here’s another step along the way to automated synthesis, in a new paper from MIT. The eventual hope is to unite the software and the hardware in this area, both of which are developing these days, and come up with a system that can produce new compounds with a minimum of human intervention. Let’s stipulate up front that we’re not there yet; this paper is both a very interesting look at the state of the art and a reminder that the state of the art isn’t up to that goal yet.

The software end of things involves (in the ideal case) being able to come up with plausible synthetic routes to the desired molecules, with “plausible” being not only in the abstract but fitted to the abilities of the hardware synthesizer itself. And since that synthesizer is very likely going to partake of a lot of flow chemistry, you’ll have a lot of thinking to do about concentrations, flow rates, non-clogging conditions, and so on. I should mention that an even more ideal system would be able to come up with its own ideas about what to synthesize, but doing that from a standing start is even further off. I think what we’ll see before that (and people have already been working on this as well) is a system that can suggest reasonable rounds of analogs given the assay data from a previous round of simple analogs, and then can turn its attention to how to synthesize them.

In this case, the flow of operations looks like this: (1) Select a synthetic target, (2) search the literature for the compound, (3) retrosynthetic analysis, (4) select reaction conditions, (5) estimate feasibility, (6) formulate a recipe for the hardware to follow, (7) configure the platform for that recipe, (8) test run of the process, (9) scaleup of the synthesis in flow, and there’s your product. But as the paper notes, there are still several of these stages that need human input, as you can certainly imagine if you’ve ever done organic synthesis, done any flow chemistry, or worked with automation of any kind at all. One of the good things about this work, actually, is the job it does highlighting just the areas that turned out to need the most human help as they got this system into shape.

The MIT group has been working on their own retrosynthesis software (ASKCOS), and they give some details of it here. The system was trained on millions of reactions abstracted from both Reaxys and the USPTO database, so it’s seen plenty of organic chemistry. For example, out of the 12.5 million or so single-step reactions to be found in Reaxys, the system is set to pay attention only to those with ten or more examples, which knocks things down to about 160,000 “rules” for valid single-step transformations. They then trained a neural network to try to predict which of these rules would be most applicable to a given new target structure – this step was put in to decrease the computational load in the next step and also to try to increase that step’s success rate.

Each proposed retrosynthesis step first gets a binary filter applied to it: are there any conditions that the program knows about that could generate the desired product from the stated reactants? Getting rid of stuff at that stage saves a lot of pointless calculation. Then if the answer is “yes”, the program turns its attention to the hits, and the proposed reaction sequences are evaluated by a more computationally intensive foreward-prediction model, which is trained up on the most synthetically plausible conditions for each transformation. If that result matches up, the route is considered believable.

The hardware is a later version of the system I wrote about here, a deliberately modular plug-and-play setup. This leaves the individual modules themselves open to upgrading on their own without affecting the rest of the system, and cuts down on the number of valves and connections in any given overall plan – and as anyone who’s set up an HPLC, LC/MS, or flow chemistry apparatus will be able to tell you, mo’ connections = mo’ problems. Every new fitting is an opportunity for something to fail later on. There is, as before, a manipulator arm in the middle of the thing that can reach over and plug the individual modules into a common rack to assemble the sequence needed for the synthetic scheme (heated loop flows into phase separator flows into solid-supported reagent bed flows into. . .) There is a lot of engineering involved in getting this to work, a forest of little details that have to be addressed. Just to pick one example, the tubing involved is all tensioned via spring-loaded reels to cut down on looping and tangling – you can easily imagine what might happen otherwise as the robot arm merrily assembles something that look like a colander full of angel hair pasta when you walk around and look at it from behind. The system generates a “chemical recipe file” for a given synthetic path; this CRF has the mapping of the physical reaction setup and the operations needed to execute the synthesis. This includes locations of stock solutions, assembly of the modules, solvents, flow rates, temperatures, and all the rest of it.

The examples given in the paper are part of the “drugs on demand” work that the Jamison lab has been doing for several years now (which I wrote about here). I remain somewhat skeptical of the stated overall DARPA goal of this work, as that post shows, but it’s an excellent proving ground for automated synthesis (which to be fair, is also one of the goals). Links below added by me: