An overview of DESTINI is illustrated in Fig. 1. Given a protein sequence, it predicts an atomic 3D structural model for the target sequence. DESTINI has two main components: contact prediction and structural modeling. The contact prediction is an implementation of a fully convolutional residual neural network composed of 102 layers in total, including 40 convolutional layers (see Methods). The input features consist of three 2D features: co-evolutionary coupling scores19, a statistical potential36, and mutual information for pairs of residues37, and three 1D features: BLAST sequence profiles38, secondary structure and solvent accessibility predictions25, which are converted into 2D features by concatenating 1D features of two separate residues of a residue pair. The contact predictions are then supplied to structural modeling, the second component of DESTINI, which is a further development based on the TASSERVMT approach (abbreviated as TASSER below)5. When there is no suitable template model available, the structural modeling essentially makes de novo predictions4; if there is a significant structural template hit, modeling based on template(s) is conducted. In both scenarios, confident contact predictions serve as the main driver towards the native structural fold.

Figure 1 Overview of DESTINI. Given an input target protein sequence, 1D (yellow rectangles) and 2D (green rectangles) features are extracted by mining existing genomic and structural data. These features are inputs to a fully convolutional residual network, a deep-learning artificial neural network composed of multiple identical residual blocks, whose architecture is shown in the blue bubble. The final layer of the network is a softmax activation layer, which outputs the probability score (P) for every pair of residues of the target sequence. The output is displayed in a contact map plot, where the upper triangle displays the probability scores colored according to the scale on the right, and the lower triangle plots a comparison of the predicted contacts (P > 0.5) versus true (native) contacts. Green, red, grey dots are true positives, false positives, and false negatives, and short/medium range contacts are represented by yellow/pink stripes. The predicted contacts are subsequently employed to derive the 3D model of the target. The final 3D structure of the target is exhibited in a cartoon representation. The purple α-helix/α-helix contacts and cyan β-strand/β-strand contacts correspond to the contacts shown in the contact map shaded using the same color code. Full size image

Below, we describe the benchmark results on three testing sets. A contact between a pair of protein residues i and j is defined if the Euclidean distance between the C β atoms of these residues (C α atom is used for Glycine) is less than 8 Å. Following CASP procedures35, the precision of the top L/k contact predictions, where L is the length of the target and k = 1, 2, 5, 10, are employed as the main metric for evaluation. We only consider non-local contacts, i.e. where the sequential distance of residues i and j, \(|i-j|\), falls into three regimes: short [6, 11], medium [12, 23], and long [24, ∞). For practical reasons, the latter two regimes are most valuable for structural modeling, and therefore, they are the focus of our subsequent evaluation. The quality of structural models is measured by their TM-score39 with respect to the native (experimentally determined) structure. A TM-score higher than 0.4 indicates a model that significantly resembles the native topology, or is native-like, and a score higher than 0.5 suggests a highly similar structure to the native fold40.

Deep-learning predicts contacts are more accurate than the co-evolutionary or template-based approach

We first evaluate the contact predictions for the “glass-ceiling” set, which is a set of 606 hard targets that are difficult for the classical threading algorithms to identify correct structural templates14. In this and the next test on the “easy” set, we excluded any protein in the training set or in the template library if it shares a sequence identity of 30% or higher with any target in the testing set. This procedure is necessary in order to obtain objective evaluations. Figure 2 shows the precision of contact prediction for medium or long-range contacts for three methods, TASSER, CCMPred19 and DESTINI. Since this set was created using hard targets for TASSER, as expected, contact prediction by TASSER using a template consensus scheme delivers poor results: the mean precision is merely 4.6% for the top L predictions, and 8.0%, 13.3%, and 16.7% for top L/2, L/5, and L/10 predictions, respectively. The co-evolutionary analysis method CCMPred19 doubles these accuracies over TASSER for this set, with the corresponding top L/k (k = 1, 2, 5, 10) predictions at 12.6%, 16.8%, 22.7%, and 26.1%, respectively. Compared to CCMPred, whose scores are the most important input feature for our deep-learning neural network, DESTINI further significantly improves the accuracy of contact prediction. For the top L predictions, the mean precision per target is 38.1%, triple the mean precision of CCMPred at 12.6%. Similarly, the mean precision for the top L/2, L/5, and L/10 sets shows dramatic improvement over CCMPred from 16.8%, 22.7%, and 26.1% to 48.3%, 57.3% and 62.3% by DESTINI, respectively. On over half of the targets, DESTINI achieves a precision better than 50% for the top L/2 predictions, with median precision values of 67% and 75% for the top L/5 and L/10 predictions.

Figure 2 Precision of medium/long range contact predictions on 606 hard targets. Top L/k (k = 1, 2, 5, 10) predictions are shown in the four panels. In each panel, three boxplots display the results of three methods, TASSER (blue), CCMPred (grey), and DESTINI (pink), respectively. In each boxplot, the black box indicates the interquartile range from 25% to 75%; the median is represented by a black bar within the box; and the whisker extends up to 1.5 times the interquartile range. The red circle is positioned at the mean value. Individual data points from each method are shown as small transparent circles in the same color code as the boxplot. Full size image

Note that there are 46 (7.6%) targets that are short helical proteins without any medium/long range contacts observed in the native structure, e.g., a single helix. For 24 (52%) of them, DESTINI correctly makes no positive contact prediction, i.e., a probability score P < 0.5. However, they are still assigned zero precision values in order to be consistent with the CASP evaluation scheme, which selects top predictions even when a score is not significant. This consideration is due to the fact that some methods, such as CCMPred, do not provide a score cutoff for positive, i.e. confident, contact predictions.

Deep-learning detects and expands contact patterns over the co-evolutionary approach

Where does the improvement come from? A representative successful example provides some hint, as shown in Fig. 3A. This is a little known TT1751 of T. thermophilus HB8, a 127 AA protein with unknown biological function but nevertheless solved crystal structure41. The observed 192 native medium/long range contacts of this target can be largely grouped into 11 geometric clusters, each with at least 3 contacts, and only a few, 12 scattered contacts do not belong to any cluster. These clusters correspond to typical contact patterns between β-strands, α-helixes, and β-strands/α-helixes as seen in the native structure (Fig. 3B). DESTINI predicts a total of 202 contacts (P > 0.5) and 152 of them are correct, hitting all 11 clusters and recalling 82.7% of individual contacts in these clusters. By comparison, if one considers the same number of top ranked contact predictions by CCMPred, it also hits all native clusters but with a much smaller coverage of individual contacts within these clusters, only 33.3%. Instead, CCMPred leaves many false positives scattered on the contact map (Fig. 3A). Therefore, this suggests that deep-learning recognizes clusters corresponding to contact patterns and promotes true positives within the clusters, while it also reduces isolated false positives not around the clusters. Even among the 50 false positives by DESTINI, 38 (76%) surround the native clusters with no more than a two-residue shift. Given these improvements, we are able to fold the structure similar to the native fold at TM-score of 0.577 (Fig. 3B). The overall backbone C α RMSD is 5.8 Å, and 87 (69%) residues have an RMSD of 2.7 Å when superimposed on the crystal structure by TM-align39. If we use the number of top ranked contact predictions by CCMPred, we obtained a much worse structural model with a TM-score of 0.437, though it already has a roughly native-like topology with an RMSD of 8.9 Å.

Figure 3 Example and statistical analysis of contact pattern prediction. (A) A representative example of the native contact map versus the predicted contact map by DESTINI (upper triangle) and by CCMPred (lower triangle). The target is TT1751 of T. thermophilus HB8 (PDB code 1J3M)41. Medium and long range native contacts are represented by circles filled in different colors for different contact clusters, except that isolated, unclustered contacts are represented by grey circles. Correctly predicted contacts by either method are indicated by black borders surrounding the circles. False positives are represented by red dots. Local and short range native contacts are displayed in light grey squares along the diagonal. (B) Superposition of the top structural model by DESTINI (red) onto the native structure (green). Models are shown in cartoon representations using the visualization program VMD53. (C) Violin plots of ΔCoverage for contact clusters and individual clustered contacts, respectively, for targets of the “glass-ceiling” set. In each violin, the black contour exhibits the probability density estimated using the data set. The blue boxplot inside the violin follows the same boxplot scheme adopted in Fig. 2. Full size image

Systematic contact pattern analysis through clustering on the “glass-ceiling” set reinforces the insights gained from the example above. We considered 545 targets, which have at least one contact cluster within the medium or long regime. On average, there are 8.9 clusters per target. Among them, DESTINI detects 4.8 clusters on average, whereas CCMPred hits 4.7 clusters per target. In about half of the cases, DESTINI identifies the same set of clusters discovered by co-evolutionary coupling analysis. However, for these clusters, DESTINI finds on average 8.2 contacts, significantly more than the 3.5 contacts provided by CCMPred. We further calculate ΔCoverage ≡ Coverage DESTINI - Coverage CCMPred (the subscript denotes the method employed to predict contacts) for clusters and individual contacts within cluster hits, respectively. These distributions are shown in Fig. 3C. In 253 (46%) cases, the same number of clusters are hit by both approaches; in 173 (32%) of cases DESTINI predicts more clusters, whereas in 119 (22%) cases CCMPred finds more clusters. On average, DESTINI improves cluster coverage by 0.04, which is very small, but statistically highly significant (Wilcoxon pair test P-value = 1.8 × 10−6). By contrast, ΔCoverage of individual contacts that have a mean value of 0.162, and 461 (85%) targets show improvement when DESTINI versus CCMPred is applied. Moreover, DESTINI’s false positives are mostly located around clusters within a two-residue shift. On average, 71% of them surround the native clusters versus 51% of CCMPred’s false positives are found using the same criteria. Overall, the improved contact predictions by DESTINI stem from better recognition of contact patterns within clusters already hit by CCMPred, while pruning isolated false positives.

Accurate contact predictions yield native-like folds for hard targets

Using the improved contact predictions, DESTINI’s predicted structural models are significantly better than the models generated by TASSER. Figure 4 demonstrates the comparison of the top models (ranked without using the native structure) by these two approaches. A histogram by the TM-score of the models suggests that much more native-like models are found by DESTINI (Fig. 4A). A total of 222 (36.6%) targets have a top model TM-score > 0.4, compared to only 52 (8.6%) targets under the same criterion by TASSER. The results demonstrate that more than four times the number of native-like models are obtained by the deep-learning based approach. Moreover, among these good models, the quality of the models is generally better by DESTINI, which yields a mean TM-score of 0.539, 18% higher than the mean TM-score of 0.456 by TASSER. If one uses a TM-score 0.5 as the criterion for a good quality prediction, DESTINI folds 127 targets (21.0%) versus merely 8 targets (1.3%) by TASSER. The fractions of mainly α, β, and α/β structures among those with native-like structures is 45%, 10%, and 45%, respectively, compared to the fractions 52%, 12%, and 36%, respectively, in the full set. Thus, the success rate is independent of secondary structure class. Overall, the results clearly demonstrate a significant advantage of DESTINI in predicting protein structural folds.

Figure 4 Structural models of DESTINI compared to TASSER models for the “glass-ceiling” set. (A) Histograms of TM-scores for each protein target. The shaded area represents good, native-like models. (B) Correlation between model quality improvement and the precision of medium/long range contact predictions (top L). Each circle represents a target protein. Full size image

Further analysis shows that the improvement of model quality by DESTINI over TASSER is strongly correlated to the quality of the contact predictions. A high Pearson correlation coefficient of 0.70 is obtained between the difference of TM-score (ΔTM-score ≡ TM-score DESTINI - TM-score TASSER , where the subscript denotes the method employed to predict structural model) and the precision of the top L medium or long-range contact predictions. Among 215 targets, which have big, positive model improvement at a ΔTM-score > 0.1, they have a mean contact precision of 61.2%. In contrast, for 167 targets which show no improvement (ΔTM-score ≤ 0), the mean precision for contact prediction is merely 18.0%. The stark contrast suggests, not surprisingly, that better contact prediction is essential to the model improvement by DESTINI.

One practical question is: how many correct contact predictions are required in order to fold a target protein correctly? Strictly speaking, the answer to this question is dictated by the specific protein fold, and thus, there is no general answer. Nevertheless, for single-domain proteins, as in this glass-ceiling set, one can get an approximate answer to this question. The probability score P for contact prediction suggests the confidence level of a prediction. Figure 5A shows that, for all non-local contact predictions, 90.5%/76.0% of positive predictions corresponding to P > 0.9/0.8 are correct. The numbers are roughly the same if one separately calculates the precision for the three regimes: 90.1%/77.3% in short, 90.7%/75.1% in medium, and 90.6%/75.7% in long range predictions, corresponding to P > 0.9/0.8. If we consider only high quality predictions at P > 0.8 for medium or long range contacts, a total of 346 (57%) targets have at least one such prediction. We further define contact prediction depth D as the number of considered predictions divided by L, the length of a target. As shown in Fig. 5B, the higher the contact prediction depth, the more accurate the resulting structural models. At D > 0.2, 0.25, 0.5, the factions of models with a TM-score > 0.4 are 69%, 75%, 84%, respectively. Therefore, it appears that high confidence contact prediction at D > 0.2 provides a good chance of obtaining a native-like fold in this single-domain set; this is rather consistent with early results which suggested that L/4 such contacts are sufficient8.

Figure 5 Precision of contact prediction and its impact on structural modeling for the glass-ceiling set. (A) Distributions of positive contact predictions at different probabilities for all non-local, short, medium, and long range residue pairs. Green indicates correct predictions and grey hash represents incorrect predictions. (B) Precision of predicted medium or long range contacts with a probability score better than 0.8 versus contact prediction depth. Each target is presented by a circle, filled by color corresponding to the TM-score of the model. Insert is the color scale. The same color code is adopted in the subsequent figures. The vertical/horizontal dashed lines are at 0.25 and 0.5, respectively. Full size image

Accurate contact predictions improve model refinement for easy targets

Next, we ask the question of whether accurate contact prediction by DESTINI can be utilized to improve the structural models of “easy” protein targets, which already have one or more significant structural templates identified by threading. To address this, we applied DESTINI to 631 targets marked as “easy” by threading approaches14. Figure 6 compares the results of contact prediction by DESTINI, TASSER, and CCMPred. On average, these targets have a far better accuracy in contact prediction than the hard targets analyzed above. CCMPred yields a mean precision of 43.3% and 67.8% for the top L and L/5 ranked scores, respectively, compared to 12.6% and 22.7% of the glass-ceiling set. Template-based predictions by TASSER provides better results: 47.8% and 73.8% for top L and L/5 predictions on average. Despite the relatively high starting points, our deep-learning approach can further improve the contact prediction by raising the mean precision to 71.2% and 91.3% for the top L and L/5 ranked scores, respectively. On over 75% of targets, DESTINI yields a high precision over 75% among top L scores.

Figure 6 Precision of medium/long range contact predictions on 631 easy targets. The same plot scheme as Fig. 2 is adopted. Full size image

Using these accurate contact predictions, DESTINI generates 561 (89%) native-like structural models among the top ranked models for each target versus 504 (80%) native-like models yielded by TASSER (Fig. 7A). This corresponds to an 11% increase in the number of targets with the native-like models. If one uses a TM-score > 0.5 as the criterion for successful structure prediction, then DESTINI folds 478 (76%) targets versus 416 (66%) by TASSER. Similarly, these numbers are 361 (57%) and 321 (51%) at a TM-score > 0.6 for DESTINI and TASSER, respectively. Only at a very high TM-score cutoff of 0.9 does DESTINI have slightly fewer, 43, proteins compared to the 54 from TASSER. However, at this level, the difference in the TM-score is less relevant because all these models have very high quality; they typically have a less than a 2 Å RMSD from the native structure. The mean TM-score for the complete set is 0.601 by DESTINI, versus 0.569 by TASSER. A total of 171 (27%) targets have improved their TM-score by more than 0.1. These targets have a mean precision for the top L medium/long range contact predictions of 74.5% (Fig. 7B). Conversely, there are very few 36 (5.6%) targets with no improvement and a low TM-score < 0.4. They have a relatively low mean precision of 44.0% in contact prediction. The reason for these low-quality models is due to a combination of inaccurate template identification and mediocre contact prediction. Overall, it is clear that DESTINI is capable of improving model refinement even when a good structural template is already available.

Figure 7 Structural models of DESTINI compared to TASSER models for easy targets. (A) Histograms of TM-scores for each protein target. (B) Correlation between model quality improvement and the precision of medium/long range contact predictions (top L). Full size image

Comparison to other contact prediction methods

Finally, we compared the performance of DESTINI to several representative contact prediction methods. The benchmark set is composed of 66 domains from 50 targets evaluated during CASP1235. We only consider cases whose structural data were publicly available in the Protein Data Bank (PDB) at the time of this benchmark test. It is important to note that we removed from our training set all entries released after May 1st, 2016, the starting date of CASP12, and re-trained the network models with the reduced training set and a sequence library dated Feb 2016 for deriving the input features. These models are employed in the final benchmark tests. This procedure emulates the environment of CASP12. For each target, we made the prediction for the full sequence with no domain partitioning performed. Domain partitioning was only performed for evaluation using the boundary provided by the assessors.

The benchmark results are shown in Table 1 and Fig. 8. Overall, DESTINI significantly outperforms the other methods. For the top L/2 medium or long-range contacts, the mean precision of DESTINI is 70.1% versus 62.3% of RaptorX, the top ranked method in CASP12, 61.3% of DeepContact, which also employs a deep-learning algorithm, 60.7% of MetaPSICOV, the contact prediction leader in CASP11, and 42.8% of Gremlin, a standalone co-evolutionary analysis method. For the top L/5 medium or long-range predictions, the mean precision is 78.8% for DESTINI, compared to 69.6%, 68.1%, 69.3%, and 47.1% for RaptorX, DeepContact, MetaPSICOV, and Gremlin. With the exception of very few targets, Fig. 8 demonstrates that DESTINI performs better for the vast majority of targets compared to other methods. Overall, the benchmark test suggests that the performance of DESTINI is among the best, if not the best, contact prediction method.

Table 1 Mean precision of different contact prediction methods on CASP12 targets. Full size table