Structural and functional analysis of EGFR tyrosine kinase domain upon mutation

Molecular dynamics simulations for WT and mutant (T854A) EGFR TK protein was performed to gain insight into the structural and functional behaviour of the drug resistance associated mutation. We studied RMSD, RMSF, radius of gyration (Rg), solvent accessible surface area (SASA) and ED analysis between the WT and mutant (T854A) EGFR protein. RMSD for the backbone of the protein structures were calculated from the initial structure (Figure 2a). In this figure, till 14ns WT showed backbone RMSD of ∼0.13 to ∼0.27 nm during this part of simulations. After 14 ns WT structure exhibited minimum deviation till the end of simulation that is 40 ns with its backbone RMSD ranging from ∼0.14 to ∼0.22 nm, where as mutant structure showed maximum deviation till the end of simulation resulting backbone RMSD of ∼0.13 to ∼0.47 nm respectively. This stable RMSD provided a suitable basis for further analysis. For determining the effect of mutation on the behaviour of residues, the RMSF values of WT and mutant (T854A) structures were calculated (Figure 2b). The results indicate higher degree of flexibility in mutant (T854A) in comparison to WT.

Figure 2 Graphs showing (a) RMSD (b) RMSF and (c) Radius of gyration of wild-type (blue) and mutant (T854A) (red) protein. Full size image

Another parameter radius of gyration (Rg) defined as the mass-weight root mean square distance of collection of atoms from their common centre of mass was helpful in giving further insight into the structural changes due to mutation and the overall dimension of the protein. Plot of radius of gyration of protein vs. time is shown in Figure 2c. It can be seen that mutant (T854A) structure exhibited higher Rg value in comparison to WT structure. Variation of SASA for both WT and mutant (T854A) proteins with respect to time can be seen in Figure S1(a). WT structure was observed to have higher value of SASA with time, while mutant (T854A) showed lower value of SASA. Greater fluctuation in Rg in mutant (T854A) structure suggested structural alteration in the mutant structure. Since, hydrogen bonds play an important role in maintaining the stable conformation of protein, analysis of WT and mutant (T854A) proteins were performed with respect to time (Figure S1(b)). The total energy (Figure S1(c)) was observed to be more or less the same throughout the simulations for both WT and mutant (T854A).

All these results indicate that mutation (T854A) rendered the protein structure more flexible affecting the structural and functional behaviour of EGFR TK protein. This result was further validated by principal component analysis (PCA) analysis.

Essential dynamics (ED) analysis gives an improved analysis of dynamical mechanical properties of the protein system. To further support our MD simulations results, the large-scale collective motions of the WT and mutant (T854A) protein using ED analysis were determined. Principal components are the eigenvectors of a covariance matrix. This projection gives the change of particular trajectory along each eigenvector. The range of the corresponding eigenvalues (Figure 3) indicated that the fluctuation of the protein system was basically restricted to the first two eigenvectors. The motion of the two proteins in phase space can be shown by the projection of trajectories obtained at 300 K onto the first two principal components (PC1, PC2) in which we observe clusters of stable states. Analysis of these plots reveals that the clusters are well defined in WT than mutant (T854A). Also, mutant (T854A) covers a greater region of phase space mainly along PC1 plane than WT. It can thus be said that mutant (T854A) is more flexible than WT at 300 K. The values for trace of the diagonalized covariance matrix of the Cα atomic positional fluctuations obtained for WT protein and mutant (T854A) protein were 7.603 nm2 and 18.3734 nm2 respectively. Trace is the total variance of the dataset thus again confirming the overall increased flexibility of mutant than WT at 300 K.

Figure 3 (a) Projection of the motion of the protein in phase space along the first two principal eigenvectors of wild-type (blue) and mutant (T854A) (red) protein structures. Full size image

Also it can be seen in Figure 4, T854A mutation causes change in active site. Since, 854 lies in the contact region of erlotinib but not close to ATP, this mutation results in reduced affinity for erlotinib while maintaining its kinase activity. Substitution of alanine in place of threonine causes the binding surface to move away from erlotinib indicating a possible reason for its decreased binding affinity. This mutation causes flexibility in the binding region of erlotinib while not affecting binding of ATP thus explaining its acquired resistance and maintained functionality.

Figure 4 Change in erlotinib binding site due to T854A mutation. Full size image

3D QSAR model data selection

A 3D-QSAR model development gives a statistical relationship between the structures and activity of chemical compounds by calculation of 3D molecular descriptors involving steric, electrostatic and hydrophobic points marked on the 3D spatial grid. The invariable columns were removed after computing the force field grid descriptors which resulted in 3163 descriptors from 3268 descriptors, thus removing 105 invariable descriptors. For development of the QSAR model, pIC50 was chosen as the dependent variable while the calculated 3D descriptors as independent variable. Division of dataset resulted in 11 compounds in test set while the rest 27 compounds in training set. The test set consisted of compounds 6, 9, 12, 28, 29, 32, 36, 37, 40, 44 and 45.

3D-QSAR model development and validation

Stepwise forward (SW) multiple regression (MR) method was applied for development of 3D-QSAR model. The descriptors chosen were E_337, S_335, E_832, E_424, S_151 and E_721 belonging to steric and electrostatic field energy of interactions with the numbers representing their respective spatial grid points. In this model, no hydrophobic descriptors were selected in the final model. The 3D QSAR model obtained is:

p I C 50 = [ 0 . 2989 ( ± 0 . 0020 ) × E _ 337 ] + [ 3 . 2763 ( ± 0 . 5560 ) × S _ 335 ] + [ 0 . 1785 ( ± 0 . 0003 ) × E _ 832 ] + [ 0 . 4938 ( ± 0 . 0033 ) × E _ 424 ] - [ 11 . 7460 ( ± 0 . 3402 ) × S _ 151 ] - [ 0 . 6486 ( ± 0 . 0019 ) × E _ 721 ] + 5 . 0198

Each descriptor is associated with a numerical coefficient and its error while the last single numerical value is the regression coefficient. Internal and external validation of the developed model was carried out using the LOO method by calculating statistical parameters and meeting critical requirements for a model to be robust. The statistical parameters obtained for this model included correlation coefficient r2 (0.9751), cross-validated correlation coefficient q2 (0.9491), predicted correlation coefficient pred_r2 (0.9525), low standard error value, r2_se (0.0966), q2_se (0.1380) andpred_r2_se (0.1282) which confirm the model to be robust. Along with this, high value of F-test (130.3822) implied that the developed QSAR model is 99% statistically valid with 1 in 10000 chance of failure. There are other important statistical parameters such as Z-scores for r2, q2 and pred_r2 which are also important for QSAR model validation. Zscore_r2 of 6.7926 implies a 100%area under the normal curve, Zscore_q2 of 4.3671 implies a 99.99% area under the normal curve and Zscore_pred_r2 of 1.6521 implies a 95.0743% area under the normal curve. These percentages indicate that the respective scores are near the mean 'μ' thus validating the model's statistical robustness. A parameter p-value for each of r2, q2 and pred_r2 was also obtained to be statistically significant with values 0.0001, 0.0001 and 0.09 respectively.

The robustness of the model can also be validated by radar and fitness plots. The fitness plot (Figure 5a) shows the extent of variation between the actual and predicted inhibitory activities of the thiazolyl-pyrazoline derived compounds. The radar plots (Figure S2 (a,b); Additional file 1) express the quality of the 3D-QSAR model by the extent of overlap between the actual value (blue) and predicted activity (red) lines. The contribution plot for each descriptor (Figure S2(c); additional file 1) specifies contribution of the properties that should be present in the lead compound for improving its inhibitory activity. Descriptors with positive contribution enhance the inhibitory activity of the lead compound whereas those with negative contribution reduce the same. Positive contribution for electrostatic descriptor shows a requirement of electropositive group at the substitution site and an electronegative group in case of negativeshi contribution.

Figure 5 Graph of observed versus predicted activity for training and test set. Full size image

The grid points E_337, E_832, S_335 and E_424 had a positive contribution (8.087%, 17.767%, 5.291% and 13.366% respectively) in the developed 3D-QSAR model against EGFR, while the descriptors S_151 and E_463 show negative contribution of 24.048% and 31.442% respectively. The grid points can be seen in Figure 1b. Steric descriptors represent the class of bulk descriptors which describe both size and shape of the molecules and fragments. Thus, positive contribution of a steric descriptor at specific grid point indicates the importance of a bulky group at that position. The value for each descriptor and predicted inhibitory activity for the dataset is mentioned in Table S2 (additional file 1).

The second class of descriptors, electrostatic descriptors give the importance of electronegative and electropositive groups at a particular site. Electrostatic descriptors with positive contribution imply the significance of presence of electropositive groups while those with negative contribution signify the importance of presence of electronegative groups.

Activity prediction of ZINC libraries using developed 3D-QSAR model

A total of 0.2 million natural compounds from ZINC library were screened and the highest predicted activity was observed to be 13.436 with 195 compounds having predicted activity above 8 and extrapolation between -1 and 1. We report the top two compounds with highest predicted activity. The first compound 7-hydroxy-3-(4-methoxyphenyl)-8-[(4-methylpiperazin-1-yl)methyl]-4H-chromen-4-one [ZINC ID: 20391511] (HCO) had a predicted activity (pIC50) of 13.44 while the second compound N-(2-(1H-indol-3-yl)ethyl)-2-((8-oxo-8H-benzo[c]indolo[3,2,1-ij][1, 5]naphthyridin-12-yl)oxy)propanamide [ZINC ID: 08792354] (NOP) possessed a predicted activity value of 11.92 (Figure 6). The QSAR model generated was also used to predict the inhibitory activity of a second generation drug, BIBW2992, as reported by Bean et al as a positive control [8]. It was observed that HCO and NOP possessed better predicted inhibitory activity than BIBW2992 (4.3). Values of top 10 ZINC compounds with their predicted activity can be seen in Table 1.

Figure 6 Structure of top (a) HCO and (b) NOP. Full size image

Table 1 Predicted activity value (pIC50) of top ten ZINC compounds. Full size table

Docking analysis of HCO and NOP to both WT and T854A structures

Both compounds (HCO and NOP) with highest predicted inhibitory activity against WT were docked with WT and T854A structures. The first compound HCO showed a binding affinity of -13.025 kJ/mol with WT while showing a better binding affinity of -16.485 kJ/mol with T854A structure. The second compound NOP also showed a better binding affinity to T854A (-8.598 kJ/mol) than WT (-8.037 kJ/mol). The results are summarised in table 2. Thus these compounds can be considered as lead compounds against both WT and T854A structures.