The output panels are loaded in the same Web page. There is one panel compiling all values for each molecule. It is filled immediately after calculation completion, one molecule after the other. This way it is possible to inspect the results for the first compounds without waiting for the whole list to be treated. This one-panel-per-molecule (Fig. 2) is headed by the molecule name and divided into different sections.

Figure 2: Computed parameter values are grouped in the different sections of the one-panel-par-molecule output (Physicochemical Properties, Lipophilicity, Pharmacokinetics, Drug-likeness and Medicinal Chemistry). The panel is headed by the molecule name and an up-arrow button to scroll to the top of the page. The molecule is first described by its chemical structure and canonical SMILES together with the Bioavailability Radar (see Fig. 3). Contextual help can be displayed by leaving the mouse over the radar or different question mark icons next to some parameters. Full size image

Chemical Structure and Bioavailability Radar

The first section, including two-dimensional chemical structure and canonical SMILES, is located below the title (Fig. 2). It shows on which chemical form the predictions were calculated (refer to Computational Methods). Moreover, our Bioavailability Radar is displayed for a rapid appraisal of drug-likeness (refer to Fig. 3). Six physicochemical properties are taken into account: lipophilicity, size, polarity, solubility, flexibility and saturation. A physicochemical range on each axis was defined by descriptors adapted from refs 23 and 24 and depicted as a pink area in which the radar plot of the molecule has to fall entirely to be considered drug-like. Leaving the mouse over the radar gives further information about the descriptors (see also Physicochemical Properties and Computational Methods).

Figure 3: The Bioavailability Radar enables a first glance at the drug-likeness of a molecule. The pink area represents the optimal range for each properties (lipophilicity: XLOGP3 between −0.7 and +5.0, size: MW between 150 and 500 g/mol, polarity: TPSA between 20 and 130 Å2, solubility: log S not higher than 6, saturation: fraction of carbons in the sp3 hybridization not less than 0.25, and flexibility: no more than 9 rotatable bonds. In this example, the compound is predicted not orally bioavailable, because too flexible and too polar. Full size image

Physicochemical Properties

Simple molecular and physicochemical descriptors like molecular weight (MW), molecular refractivity (MR), count of specific atom types and polar surface area (PSA) are compiled in this section. The values are computed with OpenBabel9, version 2.3.0. The PSA is calculated using the fragmental technique called topological polar surface area (TPSA), considering sulfur and phosphorus as polar atoms25. This has proven a useful descriptor in many models and rules to quickly estimate some ADME properties, especially with regards to biological barrier crossing such as absorption and brain access17.

Lipophilicity

The partition coefficient between n-octanol and water (log P o/w ) is the classical descriptor for Lipophilicity. It has a dedicated section in SwissADME due to the critical importance of this physicochemical property for pharmacokinetics drug discovery26,27. Many computational methods for log P o/w estimation were developed with diverse performance on various chemical sets. Common practice is to use multiple predictors either to select the most accurate methods for a given chemical series or to generate consensus estimation. The models behind the predictors should be as diverse as possible to increase the prediction accuracy through consensus log P o/w 28. In that regard, SwissADME gives access to five freely available predictive models; i.e. XLOGP3, an atomistic method including corrective factors and knowledge-based library29; WLOGP, our own implementation of a purely atomistic method based on the fragmental system of Wildman and Crippen30; MLOGP, an archetype of topological method relying on a linear relationship with 13 molecular descriptors implemented from refs 31 and 32; SILICOS-IT, an hybrid method relying on 27 fragments and 7 topological descriptors (http://silicos-it.be.s3-website-eu-west-1.amazonaws.com/software/filter-it/1.0.2/filter-it.html, accessed June 2016); and finally iLOGP, our in-house physics-based method relying on free energies of solvation in n-octanol and water calculated by the Generalized-Born and solvent accessible surface area (GB/SA) model. iLOGP was benchmarked on two drug or drug-like external sets and performed equally as or better than six well-established predictors16. The consensus log P o/w is the arithmetic mean of the values predicted by the five proposed methods.

Water Solubility

Having a soluble molecule greatly facilitates many drug development activities, primarily the ease of handling and formulation33. Moreover, for discovery projects targeting oral administration, solubility is one major property influencing absorption34. As well, a drug meant for parenteral usage has to be highly soluble in water to deliver a sufficient quantity of active ingredient in the small volume of such pharmaceutical dosage35. Two topological methods to predict Water Solubility are included in SwissADME. The first one is an implementation of the ESOL model36 and the second one is adapted from Ali et al.37. Both differ from the seminal general solubility equation38 since they avoid the melting point parameter; the latter being challenging to predict. Moreover they demonstrate strong linear correlation between predicted and experimental values (R2 = 0.69 and 0.81, respectively). SwissADME third predictor for solubility was developed by SILICOS-IT. The linear correlation coefficient of this fragmental method corrected by molecular weight is R2 = 0.75 (http://silicos-it.be.s3-website-eu-west-1.amazonaws.com/software/filter-it/1.0.2/filter-it.html, accessed June 2016).

All predicted values are the decimal logarithm of the molar solubility in water (log S). SwissADME also provides solubility in mol/l and mg/ml along with qualitative solubility classes (please refer to Computational Methods).

Pharmacokinetics

Specialized models, whose predictions are compiled in the Pharmacokinetics section, evaluate individual ADME behaviours of the molecule under investigation.

One model is a multiple linear regression, which aims at predicting the skin permeability coefficient (K p ). It is adapted from Potts and Guy39, who found K p linearly correlated with molecular size and lipophilicity (R2 = 0.67). The more negative the log K p (with K p in cm/s), the less skin permeant is the molecule.

The predictions for passive human gastrointestinal absorption (HIA) and blood-brain barrier (BBB) permeation both consist in the readout of the BOILED-Egg model17, an intuitive graphical classification model, which can be displayed in the SwissADME result page by clicking the red button appearing below the sketcher when all input molecules have been processed (refer to Graphical Output). Other binary classification models are included, which focus on the propensity for a given small molecule to be substrate or inhibitor of proteins governing important pharmacokinetic behaviours.

The knowledge about compounds being substrate or non-substrate of the permeability glycoprotein (P-gp, suggested the most important member among ATP-binding cassette transporters or ABC-transporters) is key to appraise active efflux through biological membranes, for instance from the gastrointestinal wall to the lumen or from the brain40. One major role of P-gp is to protect the central nervous system (CNS) from xenobiotics41. Importantly as well, P-gp is overexpressed in some tumour cells and leads to multidrug-resistant cancers42.

Also essential is the knowledge about interaction of molecules with cytochromes P450 (CYP). This superfamily of isoenzymes is a key player in drug elimination through metabolic biotransformation43. It has been suggested that CYP and P-gp can process small molecules synergistically to improve protection of tissues and organisms44. One can estimate that 50 to 90% (depending on the authors) of therapeutic molecules are substrate of five major isoforms (CYP1A2, CYP2C19, CYP2C9, CYP2D6, CYP3A4)45,46. Inhibition of these isoenzymes is certainly one major cause of pharmacokinetics-related drug-drug interactions47,48 leading to toxic or other unwanted adverse effects due to the lower clearance and accumulation of the drug or its metabolites49. Numerous inhibitors of the CYP isoforms have been identified. Some are affecting different CYP isoforms, while other compounds show selectivity for specific isoenzymes50. It is therefore of great importance for drug discovery to predict the propensity with which the molecule will cause significant drug interactions through inhibition of CYPs, and to determine which isoforms are affected.

SwissADME enables the estimation for a chemical to be substrate of P-gp or inhibitor of the most important CYP isoenzymes. We applied the support vector machine algorithm (SVM)51 on meticulously cleansed large datasets of known substrates/non-substrates or inhibitors/non-inhibitors (for details, see Computational Methods). In similar contexts, SVM was found to perform better than other machine-learning algorithms for binary classification40,52. The models return “Yes” or “No” if the molecule under investigation has higher probability to be substrate or non-substrate of P-gp (respectively inhibitor or non-inhibitor of a given CYP). The statistical performance of the classification models is given in Table 1, in comparison with previous SVM models on the same targets. We restricted the benchmark to state-of-the-art methods, published after 2010.

Table 1 Statistical performance of SVM classification models for substrate or inhibitor of pharmacokinetics-relevant protein, P-gp and CYP. Full size table

The quantification of the models performance is not straightforwardly comparable, because the training sets are different, most of the published models are less than 10-fold cross-validated and some statistical parameters are missing. Nevertheless, the SwissADME classifiers are competitive with previous models in term of robustness, with cross-validation accuracy (ACC CV ) grossly at the same level. Furthermore, cross-validated areas under receiver operating characteristic (ROC) curves (AUC CV ) are equal to the corresponding values found in the literature. Likewise, external prediction power is difficult to compare, as each test set includes different molecules. However, the predictive capacity of SwissADME classifiers is grossly equivalent to the related SVM methods, both in terms of external accuracy (ACC ext ) and external area under ROC curve (AUC ext ). The models for which external validation was not found (P-gp of refs 15 and 53) have to be taken with extreme caution since they possibly suffer from overfitting biases. Noteworthy, some of the published models (e.g. CYP2C9 and CYP3A4 of ref. 54 or CYP2C9 of ref. 55) were built on severely unbalanced training sets and tested on clearly unbalanced external sets. As demonstrated by Carbon-Mangels et al.56 the relevance of machine-learning classification methods, and especially SVM, are negatively impacted by datasets with one significantly more populated class. In that case, accuracy measurements are overestimated and prone to mislead the construction and the evaluation of the model. Moreover, some SVM models were published with small training and test sets (P-gp of refs 15 and 57), which imply questionable capacity of generalization and broadness of applicability domains. We emphasize that for SwissADME classifiers, both training and test sets were carefully cleansed and checked for size, diversity and balance between classes. Furthermore, our SVM models rely merely on molecular and physicochemical descriptors generated by SwissADME. We believe that this improves robustness and sustainability of the underlying methodologies. In particular, not using molecular fingerprints, molecular graphs or other structural descriptions can be an handicap to generate high statistical values but should also limit overfitting biases and yield more generalist predictive models, not necessarily influenced by specific chemical scaffolds or moieties. In our practice, these well-performing models able to estimate important ADME behaviours are of great support for pharmacokinetics optimization and evaluation of small molecules.

Drug-likeness

As defined earlier, “drug-likeness” assesses qualitatively the chance for a molecule to become an oral drug with respect to bioavailability. Drug-likeness was established from structural or physicochemical inspections of development compounds advanced enough to be considered oral drug-candidates. This notion is routinely employed to perform filtering of chemical libraries to exclude molecules with properties most probably incompatible with an acceptable pharmacokinetics profile. This SwissADME section gives access to five different rule-based filters, with diverse ranges of properties inside of which the molecule is defined as drug-like. These filters often originate from analyses by major pharmaceutical companies aiming to improve the quality of their proprietary chemical collections. The Lipinski (Pfizer) filter is the pioneer rule-of-five implemented from ref. 4. The Ghose (Amgen), Veber (GSK), Egan (Pharmacia) and Muegge (Bayer) methods were adapted from refs 58, 59, 60, 61, respectively. Multiple estimations allow consensus views or selection of methods best fitting the end-user’s specific needs in terms of chemical space or project-related demands. Any violation of any rule described here appears explicitly in the output panel.

The Abbot Bioavailability Score62 is similar but seeks to predict the probability of a compound to have at least 10% oral bioavailability in rat or measurable Caco-2 permeability. This semi-quantitative rule-based score relying on total charge, TPSA, and violation to the Lipinski filter defines four classes of compounds with probabilities of 11%, 17%, 56% or 85%. Like the other methods in this section, it primary focuses on the fast screening of chemical libraries, to select the best molecules to be purchased, synthetized or promoted at a further stage of a medicinal chemistry project.

Medicinal Chemistry

The purpose of this section is to support medicinal chemists in their daily drug discovery endeavours. Two complementary pattern recognition methods allow for identification of potentially problematic fragments. PAINS (for pan assay interference compounds, a.k.a. frequent hitters or promiscuous compounds) are molecules containing substructures showing potent response in assays irrespective of the protein target. Such fragments, yielding false positive biological output, have been identified by Baell et al.6 in analysing six orthogonal assays and breaking down the molecules active on 2 or more assays into 481 recurrent fragments, considered as potentially leading to promiscuous compounds. SwissADME returns warnings if such moieties are found in the molecule under evaluation.

Besides, we implemented Structural Alert, which consists in a list of 105 fragments identified by Brenk et al.5 to be putatively toxic, chemically reactive, metabolically unstable or to bear properties responsible for poor pharmacokinetics. In SwissADME, it is possible to have a chemical description of the problematic fragments found in a given molecule by flying over the “question mark” icon appearing after the fragment list. This is implemented for both PAINS and Brenk filters. By applying these and other physicochemical filters to design screening libraries, Brenk et al.5 observed that most of the remaining compounds satisfy criteria for “leadlikeness”. This concept is similar to drug-likeness, yet focusing on physicochemical boundaries defining a good lead, i.e. a molecular entity suitable for optimization. By definition, leads are subjected to chemical modifications that will most likely increase size and lipophilicity63. As a consequence, leads are required to be smaller and less hydrophobic than drug-like molecules. Since it is crucial for a chemist to judge whether a given molecule is suitable to initiate lead optimization, in addition to structural filters, we implemented a rule-based method for leadlikeness, which was adapted from ref. 64.

One of the key aspects of CADD activities is to help the selection of the most promising virtual molecules that will be synthetized and submitted to biological assays or other experiments. Synthetic accessibility (SA) is a major factor to consider in this selection process. Obviously, for a reasonable number of molecules, medicinal chemists are the best able to determine SA. However, when too many molecular structures prevent an expert evaluation, in silico estimation can be used for pre-filtering. Ertl & Schuffenhauer11 proposed a fingerprint-based approach for SA estimation but including closed-source information about fingerprint definition that prevents a straightforward implementation in our tool open to the scientific community. As a consequence, we have built our own fragmental method by analysing more than 13 millions compounds immediately deliverable by vendors. We assumed that the most frequent molecular fragments (technically, FP2 bits, refer to Computational Methods) in this large collection indicates a probably high SA, while rare fragments imply a difficult synthesis. For a given molecule, the fragmental contributions to SA are summed and corrected by the terms describing size and complexity, such as macrocycles, chiral centres, or spiro functions as defined by Ertl & Schuffenhauer11. After normalization, the SA Score ranges from 1 (very easy) to 10 (very difficult). To assess the performance of the method developed for SwissADME, we retrieved two test sets of SA previously published. Both sets involved external molecules, whose difficulty of synthesis was marked from 1 to 10 by nine11 and four65 medicinal chemists, respectively. The averaged expert score can then be compared to an in silico SA Score. As seen in Table 2, the predictive capacity of all three methods appears very dependent of the test set. Indeed the SAs of set 1, smaller and evaluated by more chemists, turned out to be much more robustly predictable than set 2. Human evaluation of synthetic complexity is undeniably subjective and relies on individual chemist’s training and experience. However, significant linear correlation and small errors — especially with SwissADME SA Score that outperformed the reference methods on both sets with smaller errors, and equal or higher linear correlation coefficients — demonstrate how this simple and fast methodology can help prioritizing molecules to synthetize.