In recent years, teams led by Novartis, St. Jude’s Children’s Research Hospital, and GlaxoSmithKline (GSK) Tres Cantos have released large data sets of antimalarial compounds derived from phenotypic high throughput screening (HTS) campaigns. (53-56) The GSK Tres Cantos Antimalarial (TCAMS) data set contained 13,533 filtered hits which were subsequently prioritized and grouped to provide numerous starting points for other research groups; (57) all of these 47 scaffolds have been explored either internally by GSK, by independent groups, or in collaboration between GSK and academic groups. GSK themselves have published evaluations of two of these priority series, the cyclopropyl carboxamides (58, 59) and an indoline series. (60) No further optimization work has been performed by GSK on these two series due to the inherent risks identified. The original GSK data set was used to identify starting points for the present campaign, resulting in the selection of TCMDC-123812 (OSM-S-5, Figure 2 A) and its 4-aminoantipyrine derivative TCMDC-123794 (OSM-S-6) because of their attractive physicochemical properties, such as low logP and molecular weight, coupled with promising bioactivity and therefore presumed high ligand efficiency. (Compound numbering in this paper is based on the original project numbering, rather than a renumbering for this paper, allowing simpler cross-correlation between this paper and the live Web sites. The convention used is Open Source Malaria (OSM)-first letter of city in which compound was first made (e.g., S = Sydney)-incremental integer. Batch numbers are included in internal project numbering schemes.) A number of compounds in the original TCAMS set (53) featured the arylpyrrole core with alternative head groups lacking the ester linkage ( Figure 2 B). TCMDC-123563 was discounted ( Figure S2 (61) as it represented a singleton and contained an unfavorable ketone linkage. A cluster of related compounds, the 2-iminothiazolidin-4-ones (the “near neighbors”, NN), was targeted because the members (including TCMDC-124103, -125697, and -125698) possessed promising activities without the ester, and indicated tolerance to variation elsewhere in the structure. While this work was being written up for publication, Gilbert et al. published details of a series of pyrrolones (identified through an unpublished screen performed by the World Health Organization Special Programme for Research and Training in Tropical Diseases (WHO-TDR) but also identified in the Novartis antimalarial data set (55) ) that have some structural similarities to the NN compounds (i.e., an arylpyrrole joined through a double bond to a different heterocycle), and comparisons will be drawn below with this series. (62, 63)

The “open source” moniker is not merely semantic and distinguishes such projects from other “open” ventures in several important regards, (38) described by the six laws that governed the operation of the present project ( Figure 1 Figure S1 ). (49) Crucially, the research(i.e., strategic discussions, issues involving doubt) takes place in the public domain. The Creative Commons license covering project content ensures free reuse of all content including for commercial purposes (CC-BY), (50) and uses a free or open source composite technical platform that has recently been reviewed. (51, 52)

The present work extends this idea to the identification of novel bioactive compounds. A previous demonstration of an open drug discovery cycle was shown by the Usefulchem project, which found four micromolar hits from a small product library aftertarget prediction and docking. (44) The Open Source Drug Discovery project in India has carried out a crowdsourcing project for annotation of thegenome. (45-47) In the field of biotechnology the CAMBIA organization used patents to enforce a code of conduct based on the open sharing of technologies, in that experimental tools could be freely used, provided no further patents were taken to restrict the use of those tools by others. (48)

A model that has been mooted, (32-38) but never properly implemented and evaluated, is drug discovery and development where all data and ideas are freely shared, there are no barriers to participation, and there are no patents—so-called “Open Source” Drug Discovery. The requirement for total sharing of data as well as workflows in drug discovery and development (i.e., the experimental science, as opposed to the software used in the project (39) ) would mirror the same practices in open source software development—a model that has created robust and successful products in widespread use and formed the foundation of major industries as well as spawning for-profit open source software companies. It was shown recently that the opening up of a laboratory-based chemical research project to unrestricted participation by anyone accelerated the research because experts unknown to the core team were able to join the project and solve transient project needs. (36) That project involved the discovery of a new synthetic route to a known compound, specifically the active enantiomer of the drug of choice for the treatment of schistosomiasis, praziquantel. (40) The project benefited from the open sharing of chemical data and procedures on the Internet, i.e., using the Internet as a medium that facilitated collaboration and peer review where participants could influence the direction of the research before it occurred, rather than use the online content merely as an information resource. Several other initiatives have leveraged the advantage of open online discussion of chemical data and results generated by others. (41-43)

The weak status of many drug development pipelines in the pharmaceutical industry is driving the exploration of alternative models. The current model of drug discovery, whether in academia (20) or industry, can generally be characterized by secrecy and an underlying profit motive. (21) In the area of tropical diseases there are significant philanthropic efforts being made by many companies in providing treatments (22) and engaging in drug development (23) but also in conducting not-for-profit research. (24, 25) There have been calls for greater sharing of data in the NTD field, (26, 27) including the development of patent pools (28) and new Product Development Partnerships. (29) Repositioning of existing drugs is seen as a possible general strategy for the development of new antimalarials, even though the challenges of such an approach are clear. (30) There has been much recent discussion of the need for “Open Innovation”, a term with a nebulous definition but typically describing a range of ideas from the sharing of data in a precompetitive environment through to competitions that allow organizations to bring in the best external ideas to complement in-house research but for which there is no requirement for any collaboration. (31)

The ability of the pharmaceutical industry to provide new medicines cost-effectively is diminishing. (16) The industry acknowledges that lack of innovation is a problem. (17) Pharma is responsible for the creation of most marketed drugs, yet many of these are arguably not innovative; in contrast academia and the biotech industry generate more innovative leads, but many are orphan drugs. (18) Such challenges disproportionately affect research into new medicines for tropical diseases, which would inevitably generate a slim profit margin unlikely to recoup the necessary expenses of research and development. (19)

The continual threat of drug resistance has led to the World Health Organization (WHO) recommending that all treatments should only be used in combination; artemisinin combination therapies (ACTs) comprising an artemisinin derivative and a 4-aminoquinoline or amino alcohol currently represent the front line. However, the inevitable reports of resistance or tolerance, in the form of increased parasite clearance times, have already appeared. (4-6) Loss of the artemisinin class of drugs is a terrifying scenario that requires urgent risk mitigation. Apart from the introduction of the ACTs, no viable new drug for malaria has entered the market in the past 15 years, and the recent results of the Mosquirix vaccine phase III trials showed 18–36% efficacy depending on patient age and other factors. (7, 8) New chemical series that can replace and complement the ACTs are urgently needed and are being sought by a combination of academic and industrial groups, sometimes in collaboration with nongovernmental organizations (NGOs). (9-11) Of particular interest are lead candidates with differentiated activity profiles, ideally targeting gametocyte or liver stage parasites in addition to blood stages. (12-15)

Malaria remains one of the world’s most deadly diseases. There were an estimated 214 million cases of malaria in 2015, including around 438,000 deaths of which the majority, tragically, were young children. (1) Besides the threat to human health, there is significant economic and social impact on the affected communities with malaria costing Africa billions of dollarsin direct losses and even more when considering lost economic growth. (2, 3)

Involvement of PfATP4, a putative plasma membrane ion pump, in the mechanism of action was ruled out. The relevant experiments measure the effect of a compound on the cytosolic [Na] ([Na) in isolated (trophozoite-stage) parasites preloaded with the Na-sensitive fluorescent indicator SBFI. PfATP4 is important since it is the proposed target of the antimalarial compound KAE609 (116) that has recently successfully completed phase 2 trials, (117) as well as that of a pyrazoleamide (118) a dihydroisoquinolone ((+)-SJ-733), (119) various aminopyrazoles, (120) 28 of the 400 potent antiplasmodial compounds comprising the MMV Malaria Box, (121) and a triazolopyrazine series being investigated by the Open Source Malaria Consortium. (122) All of these compounds cause an immediate increase in parasite [Naon addition to isolated parasites. Here, several compounds across both subseries were tested at 1 μM for their effect on [Nain saponin-isolated SBFI-loaded trophozoite stage 3D7 parasites. (116) In each case there was no significant effect on [Na, consistent with PfATP4 not being the relevant biological target of this compound class ( Data Set S37 Figure S38 ).

The potency of the near neighbor set led to antarget prediction study being performed on OSM-S-39 ( Text S11 ). The method employed a naive Bayes statistical model identifying molecular structural features of small molecules with protein targets, using the ChEMBL database. Similar statistical approaches based on known activities of compounds have been successfully applied to identify targets of antitubercular compounds. (112-114) A total of 1,287 proteins were included for which at least 50 active compounds were known with activities <10 μM, with the other compounds in the database comprising the inactives. After the OSM compounds were scored against each target, the scores obtained were standardized by comparison with scores obtained through comparison with a random set of >10K compounds. The prediction list was culled for proteins relevant to malaria (3D7 proteome), and the analysis gave as the most likely candidate targets, in order of significance, carboxy-terminal domain RNA polymerase II polypeptide A small phosphatase 1 (Q8I3U9), dihydroorotate dehydrogenase (DHODH, Q08210), SUMO-activating enzyme subunit 2 (Q8I553) and 1 (Q8IHS2), and cyclin-dependent kinase 1 (P61075). To kick-start the process of exploring these predictions, this compound and 24 others representative of both the hit compounds were evaluated in an experimental assay for DHODH inhibition ( Data Set S36 Text S14 ) vs two positive control compounds (TCMDC-125840 and -123822) known to inhibit this enzyme. (115) None of the compounds exhibited any activity, strongly suggesting that this is not the target for either subseries of compounds. A line of inquiry remaining open is the equivalent assays against the other targets identified in theprediction.

As with most phenotypic hit to lead projects, the activity assays were carried out on whole parasites, giving a more realistic measure of activity than enzyme-based assays at the expense of an unknown mode of action. To predict the mode of action of the members of these series, eight representative active compounds (structures in the Supporting Information ) across both subseries were evaluated in a yeast-based genetic sensitivity Hip Hop assay (106, 107) that seeks mutations that result in enhanced compound potency, i.e., to identify any sensitive biological processes. The most potent compounds (OSM-S-39 and -51) showed enrichments for processes involved in chromatin remodeling and DNA repair. Indeed, inspection of the rank ordered genes for all compounds showed a preponderance of genes for diverse components of chromatin architecture, remodeling, and post-translational modification of chromatin proteins. These results are consistent with a perturbation of chromatin condensation with predicted follow-on effects on gene expression. Several groups have highlighted apparent plasticity of global gene expression in the malaria parasite which is manifested as a complex program of histone modification. (108, 109) To attempt to gain additional insight into genome wide screens, we compared the data from several OSM compounds to a data set of 3200 similar genome wide screens to identify the 10 most similar profiles for each of these OSM analogues. This provided clear concordance between these results and screens for other compounds, for OSM-S-39 (danthron, an anthraquinone derivative, as well as artemisinin), OSM-S-51 (the antineoplastics mitoxanthrone and mitomycin C), and OSM-S-9 (the antiarrhythmic agent DTBHQ). Taken together, these data point to a global dysregulation of chromatin architecture and suggest that further bioinformatics analysis both within the hip-hop data set and to orthologous data sets may be informative. The finding that the profile of the compounds is similar to that of known antimalarials (e.g., artemisinin) is provocative and warrants consideration. While it is formally possible that the predicted mechanism of select OSM compounds is similar to that of artemisinin, in our experience the degree of concordance we see between the OSM compounds and artemisinin suggests that the OSM compounds perturb a similar, but distinct, cellular target or pathway. This observation is consistent with our recent observation that, in a compendium of 3200 chemogenomic screens, the majority of compounds fell into ∼45 cellular response classes. (106) Our finding that different compounds from the same series have different profiles is also compelling and is consistent with previously published work demonstrating that single atom changes within a compound class can produce distinct cellular responses, which we attribute to small differences in compound structure resulting in significant fitness differences. (110) In order to derive experimental evidence for possible differences in mode of action, a parasite reduction ratio (PRR, generically “rate of killing”) (111) assay was run for six of these compounds (across both arylpyrrole and NN series) even though this assay is not a direct measure of differences in gene expression. The results ( Text S1, Figure S37 ) suggest a common mechanism of action between the subseries and one that is distinct from artemisinin, which has previously exhibited a substantially faster killing profile. It remains a possibility that thetarget and/or impact on gene expression of the compounds in the arylpyrrole/near neighbor series are distinct and yet they still share a common mode of action, but this was not investigated further here.

Liver stage activity is strategically important in antimalarial drug discovery because compounds that block development of exoerythrocytic parasite stages in hepatic cells often prove to have causal prophylactic activity in animal models. (14, 105) Three compounds (original hit OSM-S-5 and the NN compounds OSM-S-38 and -111) were assessed for their activity against sporozoites in liver cells (vs atovaquone as positive control) and displayed varying potencies ( Table 2 Figure S31 ) that track with blood stage potency. Of particular note is the striking potency of the nontoxic compound OSM-S-38, which may at this level provide protection from malaria to people who have been treated with the compound after an infectious bite. Given the similarity observed in the possible mode of action (see next section ), it is unclear why the arylpyrrole compound OSM-S-5 should exhibit such low levels of activity in the present assay vs the NN compounds. Possible explanations include differential solubility or stability; the level of cytotoxicity observed for OSM-S-5 suggests that the liver stage potency is probably just a generic toxic effect.

When the compounds were evaluated in a dual gamete formation assay (DGFA) (104) to evaluate separately the susceptibilities of male and female mature stage V gametocytes to both the hit ester (OSM-S-5) and a selection of NN compounds, it was found that all compounds possessed low activity at 1 μM against both sexes ( Text S10 ). The discrepancy in the data arising from these two gametocyte assays may arise from the slightly different cell biology assayed (stage IV/V gametocytes vs stage V mature gametocytes) or most likely is a function of the relative compound exposure times (96 h vs 24 h).

There are relatively few compounds effective against the gametocyte stage of the parasite, (12, 13) though such compounds are important in the prevention of disease transmission. Four compounds found to be active in the asexual assay were evaluated against late stage (IV–V) gametocytes. The results ( Table 1 Data Sets S27 and S28 ) indicated that the NN compounds OSM-S-38 and OSM-S-39 exhibited highly promising ICvalues of 4 nM and 2.6 nM respectively, comparable to the activities of artemisinin and artesunate. Compound OSM-S-9 also showed good activity while one of the original hit compounds (OSM-S-5) exhibited low levels of activity. Several compounds evaluated were found to lead to an unusual parasite morphology that may be indicative of a slow-acting mechanism of action (Supporting Information, late stage gametocyte assays 1 and 2 ).

There is increasing awareness of the importance for drug candidates of inhibition of hERG (the human-related gene ion channel), which is sensitive to blockade by many drug-like structures. Such blockage has led to a number of prominent postmarketing withdrawals. (103) Regulators are sensitive to any hERG activity, and a hERG counterscreen is often now run early in hit characterization and series prioritization. Compounds OSM-S-5 and OSM-S-35 were shown not to suffer from significant hERG activity (IC> 33 μM vs 0.7 μM for control compound quinidine), implying that the original ester and NN class of compounds are not likely to exhibit undesirable cardiac side effects later in development ( Data Set S26 Text S11 ).

A metabolite identification and glutathione trapping experiment was carried out on OSM-S-35 in the presence of metabolic activation (human microsomes, Data Set S25 ). A number of metabolites were detected, mainly oxygenated species (mono-, bis-, and trioxygenated metabolites) with the predominant metabolite (based on peak area and assuming similar response factors for each metabolite) arising from likely hydroxylation of the pyrrole substituted benzene. In the presence of glutathione ethyl ester (GSH-EE), weak signals for adducts were observed for both parent compound and mono- and bisoxygenated metabolites. In the case of the parent compound, the GSH-EE adduct was observed both in the presence and in the absence of metabolic activation. Further characterization of the adducts was precluded by the very weak MS/MS spectra; however, the detection of adducts suggests that the formation of reactive species cannot be ruled out.

Representative compounds from both subseries were evaluated in an oralmouse trial ( Data Set S22 ). The original starting points OSM-S-5 and -6 ( Figure 2 ) were chosen along with NN representative OSM-S-35 ( Figure 4 ). All three compounds were found to be inactiveinANKA infected mice at 50 mg/kg after 4 days po. It is plausible that at least some of the arylpyrrole ester compounds would be degraded by general hydrolysis during absorption, and this could explain their inactivity in the mouse model despite their more favorable (although still high) cLogP values. Analysis of the plasma samples from the trial with OSM-S-5 showed that the compound was indeed orally available, but concentrations in the blood were above the ECfor only approximately 4 h ( Data Set S23 ). The same compound was evaluated for stability in human and mouse plasma and found to be susceptible to hydrolysis (114 min), but it was stable in human plasma with no measurable loss after 240 min ( Data Set S24 ). Esterase activity is higher in rodents than in other species, (102) as confirmed by using-nitrophenol acetate as a control compound in this assay. By way of comparison, the literature pyrrolone series had also exhibited low oral bioavailability that likely resulted from a combination of low solubility and significant metabolic clearance. (62, 63)

The two original GSK hits (OSM-S-5 and -6) and six NN compounds were evaluated for their kinetic solubility in phosphate buffer and their metabolic stability in human liver microsomes ( Table SB5 Data Sets S20 and S21 ). Both the arylpyrrole esters showed good solubility, but showed degradation in microsomes even in the absence of cofactors for cytochrome P450 and glucuronidation enzymes (NADPH and UDPGA, respectively) suggesting degradation by nonspecific enzymes. The iminothiazolidinone (NN) compounds showed generally low rates of metabolic degradation but at a cost of very low solubility (a general feature also of the literature pyrrolone series (62) that displayed typically higher rates of metabolic clearance).

To assess further the attractiveness of this class of compounds, attention was focused on the most potent of the newly discovered analogues and the original hits. Given that the compounds arose from a phenotypic assay, no mechanism of action (MoA) was known, and preliminary investigations described below were designed to probe this, with a view to minimizing any potential MoA overlap with other compounds already in development.

All these structures are available for investigation by the community, building on the unsuccessful attempts detailed in the online laboratory notebooks. It is hoped that the sharing of negative synthetic data in this way (16 attempts in the case of the ether OSM-S-236) will lead to a faster completion of syntheses of analogues in the future since prior attempts are not “orphaned” in undisclosed or unpublished notebooks. Participants in future work may be physically located anywhere; they are requested (but not obligated) to operate open source (unrestricted sharing of all data and ideas) to avoid wasteful duplication of effort. The results, whether substantial or incremental, may be added to the series wiki. (101) However, it is important to acknowledge the limitations of the series identified to date, meaning further analogue synthesis undertaken by the community in the absence of better knowledge of the biological target is likely to be unproductive.

pharmacophore modeling has to date proven ineffective at high-confidence analogue prediction, but this remains an open challenge (99) to which others may contribute given the data set available. (68, 69) Some preliminary results suggesting a common feature map for the arylpyrrole and NN subseries were of particular interest and could be explored with the more substantial bioactivity data now available.

Several synthetic targets for this series remain open ( Figure 6 ) such as the ether compound OSM-S-236 (though it seems likely that this compound will be unstable) and the oxazole OSM-S-246. The oxadiazole shown was proposed, and preliminary experiments toward its synthesis were performed (see precursor OSM-S-269 in the project laboratory notebooks for further details). (64) The oxadiazole is a common replacement for carbonyl containing compounds in hit to lead campaigns, (98) and so it was reasoned that the inclusion of this heterocycle might have favorable consequences for drug metabolism, though a commercial oxadiazole analogue (OSM-S-85, Supporting Information ) had been found to be inactive, leading to the synthetic effort toward the oxadiazole being downgraded. The tolerance of the NN set to the introduction of the pyridine in OSM-S-51 ( Figure 4 ) could be explored further as a means to increase solubility in that cluster, and a pendant substituted piperidine was found to lead to several potent compounds in the pyrrolone series that possesses some structural similarities to the NN series. (63) Indeed variation of the aromatic group in the arylpyrrole series was not explored given the intractability of substituting the ester: given the tight SAR, low solubility, and poor metabolic stability observed for the series, the project viewed the probabilities of success as limited and so did not pursue these targets.

The oxazole analogue OSM-S-105 was prepared from the carboxylic acid of the corresponding arylpyrazole, though the analogous sequence for the parent arylpyrrole series could not be completed ( Figure SC12 ). This compound and several of the synthetic precursors were evaluated and all found to be inactive.

Two pyrazole analogues (OSM-S-57 and -92) were synthesized ( Figure SC11 ) and evaluated to assess the impact of modifying the arylpyrrole core. Both compounds were found to be inactive; the fluoro analogue of OSM-S-57 was thus not synthesized. A related heterocycle alteration was also found to be detrimental to the literature pyrrolone series. (62)

In an attempt to decrease the hydrolytic lability of the ester, methyl groups were introduced adjacent to this functional group (OSM-S-116 and -68) ( Figure SC8 ). To assess the influence of introducing methyl groups to the terminal amide, analogues OSM-S-82 and OSM-S-91 were purchased. All compounds were found to exhibit low potency ( Table SB4 ), further suggesting that minor structural changes to the potent TCAMS hit compounds reduce activity.

One of the most promising proposed analogues was the ether OSM-S-236. A number of approaches have failed to generate the desired product, attributed to the instability of either the arylpyrrole alcohol starting material OSM-S-11 (which decomposed on silica and when stored under inert conditions at 2 °C) or (if formed) the desired product itself ( Figure SC6 ). The synthesis of this compound was abandoned, given that similar side reactivity may be seen, although such reactivity could be mitigated through the use of more electron-deficient pyrroles.

Poor solubility with excessive lipophilicity may not just impart poor pharmacokinetic properties but also drive nonspecific protein reactivity through hydrophobic burial that may not be picked up inassays. The emphasis for this series moved toward analogues that promised improved solubility. As with all key strategic decision points in this open source project, discussion of possible structures took place in an online public consultation. (95) The meeting recalibrated the project focus with selection of the next synthetic compounds and agreement on which commercially available analogues to purchase and evaluate ( Figure S24 ). (96) This community consultation confirmed ethers, amines, sulfonamides, oxadiazoles, and substituted ester analogues of the original arylpyrrole hits as the most valuable targets ( Figure S25 ). (97) Synthesis of a shortlist of such analogues was planned and undertaken by whomever wished to do so (half of the ten top-ranked synthetic shortlist were ultimately made; synthetic planning assistance was receivedfrom the private sector ( Supporting Information Texts S6 and S7 ), and the most relevant commercially available compounds were purchased and evaluated. GSK assessed the proposed compounds and confirmed (publicly) that none of the molecules had previously been evaluated for antimalarial activity as part of the TCAMS screen.

The open nature of the project enabled regular consultation with the wider medicinal chemistry community in real time, i.e., where community input could influence the direction of the research. An important contribution to the project from outside the core experimental team ( Figure S17 (80) was discussion of whether the most potent members of the NN series were pan assay interference compounds (PAINS), i.e., compounds frequently appearing as hits in high throughput screens yet which do not exhibit a straightforward “drug-like” interaction with a biological target. (81) The OSM compounds were run through the KNIME PAINS filter, (82, 83) and both the 2-imino-4-thiazolidinone and arylpyrrole components of these compounds were flagged as potential PAINS, with the proposed cause of the interference of the former being the thiazolidinone-double bond acting as a Michael acceptor, though it has been noted that this core is much more problematic in rhodanines than in their 2-imino counterparts. (84) The topic of PAINS has been the subject of extensive recent discussion in papers (84-88) and in online communities ( Figures S18 and S19 ). (89, 90) Although most concern has centered on rhodanines, any related structure could be problematic if it contains a potentially reactive conjugated-double bond. In the area of chemotherapeutic and antiparasitic agents, such motifs may still be present in viable leads. (84) The negative view of rhodanine derivatives in the medicinal chemistry community is generally derived from academic reports and patents where positive assay hits have been reported without adequate evaluation of SAR or elucidation of the mode of action. A complicating factor, often not considered, is that PAINS are defined on the basis of results from target-based screens, where one would not link a cellular readout to the target if nonspecific protein reactivity were possible. (84) Conversely, a covalent modifier from a cellular screen may still be useful as a probe under some circumstances. (91) The present compounds may not be problematically reactive and may still be progression candidates for the following reasons: (1) The parent hit compounds (TCMDC-123812 (OSM-S-5) and -123974 (OSM-S-6), Figure 2 ) were shown not to be “promiscuous” frequent hitters in the original GSK data; (53) (2) the assay data described above show that the relevant 2-imino-4-thiazolidinone fragment was inactive on its own (OSM-S-55, Table SB3 , entry 30), and (3) preliminary experimental controls were performed to assess the reactivity of thedouble bond ( Figure SC5 ) that showed no reactivity of the exocyclic double bond to hydrogenation or the addition of hydride or a thiol in model cases. Yet it was noted that these compounds not only are closely related to known PAINS but also fail by ALARM NMR filtering, which is designed to detect known protein-reactive cores. (92-94) Ultimately, the authorship team adopted varying positions. Overall, given the poor physical properties of these compounds and the extensive additional work that would be needed to fully mitigate the series risks, and encouraged by one of us (J.B.B.) to move away from PAINS-like structures, the team decided not to further pursue this subseries.

While high potency was obtained with several members of the NN series, this was achieved at the cost of high calculated lipophilicity with many compounds in this series exhibiting calculated logP values of 5 or more and correspondingly poor lipophilic efficiency ( Figures S15 and S16 ). (79) Solubility was a challenge in several of the assays examining these compounds. Those compounds with more polar groups on the constituent aromatic rings suffered a drop in potency, but the pyridinyl analogue OSM-S-51 (calculated LogP = 1.8) provides a possible future line of inquiry for the community. A lipophilicity/potency trend was also generally seen in the pyrrolone series, (62) though potency was seen for several compounds containing a substituted piperidine ring in place of the-aryl group. (63)

Many compounds in the NN series showed high potencies (shown schematically in Figure 4 ; raw data in Table SB3 ), with several found to be more active than the original TCAMS hits. The compounds exhibited low associated cytotoxicity. The aryl component of the pyrrole moiety was moderately tolerant to changes (though small changes could result in large changes in activity, exemplified by the 4-F and 3-F isomers), while the thiazolidinone component was found to be more sensitive. Incorporation of cyclopentyl, phenyl, and acetyl components was tolerated, but the methylenenitrile group was not. Replacement of the arylpyrrole moiety with a phenyl resulted in loss of activity.

In parallel with the initial evaluation of the amide analogues, a number of NN analogues were synthesized ( Figure 4 ), typically from a Vilsmeier–Haack oxidation of the relevant pyrrole to the corresponding aldehyde and then condensation with the appropriate 2-iminothiazolidin-4-one ( Figure SC3 ). (76-78) While double bond geometry was undefined for the original hits, Z-geometry was established here by X-ray crystallography on four compounds and therefore assumed more broadly for the series ( Figure SC4 ). Given the low predicted solubility of these NN compounds, several were generated with lower clogP values and submitted for biological testing (OSM-S-109 through -115, OSM-S-108, and OSM-S-138), alongside analogues synthesized and submitted by an independent undergraduate laboratory cohort: OSM-A-1 through -4 and resyntheses of OSM-S-37 and -111.

Replacement of the ester with a hydrolytically more stable amide was undertaken through synthesis of eight derivatives ( Figure 3 ), most of which were obtained through SOCl-mediated conversion of acid OSM-S-4 to the acid chloride. Compound OSM-S-16 served as a control lacking the pyrrole moiety, OSM-S-8 served as a truncated “des-glycinyl” analogue, and the importance of amide methylation was explored with compounds OSM-S-59 and -93. In addition, the six most relevant commercially available compounds were purchased ( Figure SC2 ). (For more on strategies for selecting compounds for commercial acquisition, colloquially known as “SAR by catalog”, see the Supporting Information Text S2 and the component files referred to therein).)

To confirm the promise of the two starting points OSM-S-5 and -6, they were evaluated against 3D7 (drug-sensitive) and K1 (chloroquine resistant) strains ofin a whole cell assay and against HEK-293 cells as a cytotoxicity marker ( Table SB1 , Biological Protocols; tables with the prefix “SB” may be found in the Supporting Information ). Biological data may be browsed in a static data set taken as a snapshot for this paper ( Data Sets S1 (Excel), S2 (SDF)), online in a database constructed through periodic batch uploads, (68) or in a “living database” to which may be added future data; (69) the latter may be visualized in a web browser (70) using an open source system that was recently deployed in the Wikipedia Chemical Structure Explorer, (71) or the data may be downloaded and visualized offline with proprietary ( Figure S4 (72) or open source ( Figure S5 (73) tools). The evaluation was performed in three different institutions using different assays and widely employed controls; controls are important to assess reproducibility and interassay variability (74) but also to minimize possible bias in evaluating compounds where there are pre-existing data from other researchers already in the public domain ( Figure S6 ). (75) The tests confirmed that the original compounds TCMDC-123812 (OSM-S-5) and TCMDC-123794 (OSM-S-6) are potent (300–500 nM range), although slightly less than previously reported (ICof ca. 330 and 54 nM respectively from a colorimetric LDH assay over 72 h (53) ) with low associated cytotoxicities and similar efficacy against 3D7 and K1 strains. The stability of the ester linker under biologically relevant conditions was expected to be poor, so the aldehyde, ethyl ester, and carboxylic acid 4-fluoropyrroles made en route to these compounds were evaluated as potentially active fragments but were found to exhibit relatively low activity, suggesting that the parent compounds do not act as prodrugs in this way (similar inactivity was observed for the 4-H, 4-Me, and 4-CFaldehydes, esters, and acid fragments) ( Data Sets S3 S5 ).

The two original GSK hits (OSM-S-5 and -6) were successfully resynthesized via a novel pyrrole acid (OSM-S-4, Figure SC1 ) that was prepared via a Paal–Knorr cyclization of the relevant aniline and ethyl 2-acetyl-4-oxopentanoate. (65, 66) This approach was found to be superior to an alternative method involving initial synthesis of the unfunctionalized-arylpyrrole, followed by conversion to the corresponding aldehyde with a Vilsmeier–Haack reaction (a procedure that was improved through a community suggestion ( Figure S3 (67) ) and then oxidation to a carboxylic acid, because the pyrrole aldehyde was found to be remarkably resilient to a range of oxidants. (An alternative route using a Friedel–Crafts acylation between the unsubstituted pyrrole and ethyl chloroformate, suggested in an e-mail from the community, gave only starting material in two attempts.)

Conclusions ARTICLE SECTIONS Jump To

The public deposition of novel antimalarial hits from phenotypic whole-cell assays has had a significant effect on worldwide antimalarial drug discovery by providing an embarrassment of riches for the early stages of discovery. The plethora of alternative structures available for investigation has ultimately led to the series described in this paper being “parked” in favor of other possible avenues of inquiry. Interestingly the “stop” decision was straightforward to make in part because the decisions taken communally in the project had to be justifiable to all onlookers. The time taken to reach the stop decision in this case was probably slightly longer than would be expected from a traditionally structured project because certain contributions were not paid for or grant-supported, necessitating a lower priority than core interests of the contributing laboratories.

Both subseries investigated have members with major strengths, including potency with low molecular weight and significant late stage gametocyte and liver-stage activity coupled with low toxicity and low levels of activity in the hERG assay. Indeed, OSM-S-5 is bioavailable. Many of the most obvious structural changes to the hits, in several cases changes of a single atom, led to total loss, rather than moderation, of biological activity, known colloquially as “activity cliffs” ( Figure 7 ). In the case of the ester-containing hit OSM-S-5, the ester was likely the key metabolic liability but could not be replaced with other common isosteres without loss of activity, yet (in the mouse model, at least) was found to be too short-lived in plasma to be taken on further. Other minor changes remote from the ester were also found to result in a precipitous decrease in potency. The “near neighbor” analogue set displayed impressive potency coupled with low cytotoxicity but alongside low solubility that could not easily be engineered out through side chain modifications.

Figure 7 Figure 7. Sensitivity of the initial hit OSM-S-5 to minor structural changes.

The emergence of large amounts of open data in the field of drug discovery for malaria makes it straightforward to search for new ways forward in a project where a stop decision has been made. A similarity network map was generated for the most potent compound identified (OSM-S-39, Data Set S38 Figure S39 ) that identifies those compounds most similar in structure known to the open databases ( Figure 8 ). Some of these are now represented in the MMV Malaria Box, (123) making simpler their investigation by other groups. A portion of this map is represented, alongside the structures and potencies of the closest neighbors in the map. While it is clear from this analysis that OSM-S-39 remains the most potent compound identified in this class, exploration of the neighboring structures in such open data sets may identify a way forward for this series that could suggest unexplored strategies to increase solubility with a realistic chance of maintaining potency, such as the potential scaffold-hop to the triazine TCMDC-125770. (One of the compounds shown from the Novartis screen, GNF-Pf-5137/1137, is from the same series as the published pyrrolones. (62) ) The recommended way forward, rather than further analogue synthesis in the present series, is to pursue clarity in the mechanisms of action of such series using, for example, generation of resistance coupled with genomic sequencing, or pull-down studies. Subsequent screening might then establish better starting points from related structures (informed by what has already been tried via comparison with network maps) against the relevant target.

Figure 8 Figure 8. Closest neighbors of OSM-S-39. (Left) Portion of a network similarity map generated for OSM-S-39 (coded as batch ZYH-72 in the figure) using methods described in the Supporting Information. (Right) The structures (stereochemistry assumed) and potencies (3D7, range if multiple values reported) of the most similar compounds in the ChEMBL database (v13) (key to compound sources: red = GSK TCAMS, blue = Novartis, green = St. Jude’s).

One of the unique features of this project, the open source research method, ensures that the unexplored lines of inquiry remain open alongside the attendant data posted online that makes it straightforward for others to resume any portion of the research project as fully fledged participants, with access to both positive and negative data, details of all procedures as they were carried out (to aid reproducibility), and anecdotal insight into project loose ends that are easy to explore. The machine-readability of the present project (for example the use of cheminformatic strings in the online electronic lab notebook) permits an unusually straightforward link between a high throughput screening result in a public database and a “live” research project that has investigated that compound.