Suitability of the protein to the pH-based variation approach used

Our approach is based on small variations of pH within an interval that does not lead to major protein conformational changes. DENV C (Fig. 1a,b) was used as a model to study the relationship between the protein structure/dynamics and backbone N-H solvent accessibility (i.e., the ability of the N-H hydrogen to exchange with water hydrogens). The interval used is between pH 6.0 and 7.5, a physiological range that is suitable for most proteins, including DENV C, as described ahead. Taking advantage of the pH dependent amide N-H hydrogen exchange process10,11,12,20,21,22,23,24, 1H-15N HSQC spectra of DENV C were acquired at pH 6.0 and 7.5 (Fig. 1c,d, gray and black, respectively). Only specific peaks show decreased intensity and/or a chemical shift variation (Fig. 1c,d). The variations in intensity affect more peaks and are more pronounced than chemical shift variations, suggesting no major conformational change triggered by pH. To ascertain that, we acquired 1H-15N HSQC spectra of DENV C at several pH values (6.0, 6.5, 6.75, 7.0, 7.25 and 7.5), assessing the spectral evolution as a function of pH (Fig. 1c,d). The spectral region represented in Fig. 1d shows the four amino acid peaks that present the most pronounced variation of chemical shift, namely L44, K73, K74 and R100. Even for these residues, the changes are minimal, implying the conservation of DENV C overall architecture.

We then consulted the pK a values of titratable residues (Fig. S1)39, since acid-base equilibrium could cause conformational changes that would difficult the interpretation of the results. DENV C theoretical isoelectric point is at pH 12.6 and, importantly, its sequence does not contain amino acid residues titratable within the pH range studied (Fig. S1, orange bar). We also measured the NMR transverse 15N amide relaxation rates (R 2 ) in both pH conditions (Fig. S2), which demonstrated that there are no conformational transitions triggered by pH. This parameter is sensitive to alterations on the size/shape of a protein (since, in globular proteins, it generally increases with the protein hydrodynamic diameter), as well as to local fluctuations in the flexibility of particular amino acid residues7,9,40,41,42. It is clear from Fig. S2 that the R 2 values obtained for DENV C are overall invariant in this pH range. Therefore, taking all of the above into account (namely Figs. 1c,d, S1 and S2), the overall DENV C structural arrangement is maintained. Thus, the pH-induced 1H-15N HSQC spectral differences are solely due to amide hydrogen exchange with water10,11,12,20,21,22,23,24, which reports on solvent accessibility.

Probing solvent accessibility to the protein backbone

The spectral changes observed are consistent with an amide hydrogen exchange process (i.e., where N-H groups exchange their hydrogens with water hydrogens)10,11,12,20,21,22,23,24. Such process only occurs if N-H groups are exposed to the solvent and not in an intramolecular hydrogen bond. Therefore, these changes directly report on N-H groups’ solvent accessibility. At constant temperature, this exchange process occurs at a rate that increases 10 fold per pH unit10,11,12,20,21,22,23,24. Thus, here, by increasing the pH from 6.0 to 7.5, the hydrogen exchange rate constants increase 31.6 fold (i.e., 10(7.5−6.0)). This causes a decrease of the NMR peak intensity, since the fact that the amide proton starts to jump more frequently back and forth between the water and the amide sites leads to an enhanced decay of the transverse magnetization during acquisition. Spectral changes are therefore dependent on the extent of the increase of the N-H exchange rate constant. For the most solvent accessible N-H groups, peaks may even disappear from the spectrum at pH 7.5 (e.g., residues K7, A8 and L44 in Fig. 1d). These spectral changes can be highly informative if properly explored, reporting on structural and dynamic properties of proteins. As such, we studied them here, to develop a method that provides insights into protein structure and function, both at the individual amino acid and domain level.

To establish this new methodology, we first compared the maximum variation of pH values tested, by plotting the intensities at pH 6.0 and 7.5 as a function of protein sequence (Fig. 2a, gray and back, respectively). At pH 6.0, the N-H groups of N-terminal region residues display higher peak intensities, consistent with their disordered nature26,31,32. To simplify the analysis and compensate for differences in initial intensity (Int pH 6.0 ), results were normalized by the ratio between the intensities at pH 7.5 and 6.0 (Int pH 7.5 /Int pH 6.0 ; Fig. 2b). The whole N-terminal region and specific residues located in the α1 and near loop regions decrease their intensity as the pH increases to 7.5. These findings are worth considering in the context of DENV C three main structural regions (Fig. 1b and Table S1). Briefly, at pH 7.5, the peak intensity of some residues is less than half of their initial values (Fig. 2b), namely: R5 to R22 (except P12) in the N-terminal region; V23 and T25 in the D1 domain; Q27 and T30 in α1; S34, R41, G42 and L44 in L1-2; A49 in α2; I59 and G64 near L2-3; K74, S75 and K76 near L3-4; and, R99 in the C-terminal domain. As such, the three main structure/dynamics regions of DENV C are clearly distinguishable, as further detailed ahead.

Figure 2 DENV C 1H-15N HSQC peak intensities at pH 6.0 and pH 7.5, and their ratio. (a) DENV C 1H-15N HSQC peak intensities at pH 6.0 (gray bars) and pH 7.5 (black bars), and (b) ratio of HSQC peak intensities at pH 7.5 and pH 6.0 (Int pH 7.5 /Int pH 6.0 ). Error bars represent standard error (SE). The symbols in a encode the reason why the respective residues could not be analyzed by NMR: ‘#’ for residues that are not assigned, ‘o’ for overlaps, ‘*’ for absent resonances due to line broadening, and ‘P’ for prolines. The horizontal line on b marks the ratio equal to 1. The main structural features are indicated on the top of the figure: the three structure/dynamics regions26,31,32, the secondary structure domains29,31,32, and the protein primary sequence. Colored columns are a guide for the data corresponding to each secondary structure domain (pink columns represent the experimentally determined α-helices31,32, while the cyan column corresponds to the transient α-helix suggested by our previous work29). Full size image

Average solvent accessibility of protein regions

Given the above, we then analyzed the intensity changes in the context of the protein structure and dynamics of the main regions of the protein. For such purpose and although each N-H group of an individual amino acid behaves differently in response to pH10,11,12,20,21,22,23,24, we considered that structural factors are more determinant and we averaged the backbone N-H group response to pH across regions. Those that are protected, either by being buried within the structure or within an intramolecular hydrogen bond, will not be affected by pH. The amino acids that are not protected will be responsive to pH within the pH range tested here. The ratios determined in Fig. 2b, when averaged across a protein region or domain, provide a single parameter to distinguish between structural and dynamics sections.

Fig. 3 depicts the average Int pH 7.5 /Int pH 6.0 of the three main structure/dynamics regions (left panel) and of the secondary structure domains (right panel). The main structural regions are distinguished by their average backbone solvent accessibility (Fig. 3, left panel): the disordered N-terminal backbone is highly exposed, the flexible fold is partially accessible, and the conserved fold is mostly inaccessible to the solvent. Looking at the secondary structure domains (Fig. 3, right panel), the average values show that the backbone of the α0 domain, which is disordered and may transiently adopt an α-helical secondary structure29, is highly exposed to the solvent. Among the α-helices, the backbone of α1 presents values that are in between the obtained for α0 and those of the α2, α3 and α4 backbones, suggesting an intermediate exposure of α1 backbone to the solvent, implying a certain degree of flexibility. Therefore, DENV C α1 has more freedom to interact with the solvent, in line with our previous studies26,29. Moreover, the average backbone values for loop regions L2-3 and L3-4 are, in general, similar to those of nearby α-helices (Fig. 3, right panel). Thus, this approach probes the differences in backbone N-H groups solvent accessibility for the main structure/dynamics regions (Fig. 3, left panel), as well as for the secondary structure domains (Fig. 3, right panel).

Figure 3 Average of the NMR peak intensities ratio for the three structure/dynamics regions and secondary structure domains of DENV C. The NMR peak intensities ratio between pH 7.5 and pH 6.0 for each residue (data from Fig. 2b) were averaged across the residues that comprise each of the three structure/dynamics regions (left panel) and secondary structure domains (right panel) of DENV C. Error bars are SE. Full size image

1H-15N HSQC peak intensities in function of pH

The changes in the 1H-15N HSQC spectrum for each pH tested (6.0, 6.5, 6.75, 7.0, 7.25 and 7.5) were assigned to the respective individual N-H groups of the protein (Fig. 1c,d), to give a complete picture of the peak evolution with pH. Fig. 4a shows the evolution of 1H-15N HSQC peak intensities as a function of pH for three residues (M15, T30 and R97) representative of the three main structural regions. Importantly, the solvent accessibility probed via the approach presented reports the interaction of each specific amide group with water. It can be used to distinguish the solvent accessibility of backbone and side-chain N-H groups within the same residue, as shown for W69 N-H groups (Fig. S3), where the backbone amide is not affected by pH, while the indole N-H group value varies significantly. Therefore, each N-H group reports its own microenvironment. The highly localized probing sensitivity illustrates the methodology great resolution level, a property that can be exploited to gain vital structural and dynamics information.

Figure 4 Normalized peak intensities of individual N-H groups and the average among DENV C structure/dynamics regions, as a function of pH. (a) Int/Int pH 6.0 variation with pH for the backbone N-H groups of M15, T30 and R97, representing residues in the disordered N-terminal, flexible fold and conserved fold regions, respectively. (b) Average intensities ratio for the three major structure/dynamics regions of DENV C. In all graphs, lines are fits of equation 1 to the data, from which slopes were extracted. Full size image

Besides this, we can analyze the normalized intensity of the backbone N-H groups, as a function of pH, for each amino acid, with striking differences between amino acids from different regions of the protein (Fig. 4a). An average of the normalized intensity of all the amino acids for each main region can then be obtained (Fig. 4b). The three key structural regions of DENV C are clearly distinguished (Fig. 4b): the conserved fold suffers no major changes (green), the N-terminal suffers the greatest change (blue), while the flexible fold shows an intermediate regime (red). The flexible fold also has larger error bars (Fig. 4b, red), indicative of higher heterogeneity among the constituting residues solvent accessibility. For each secondary structure domain, the average intensities as a function of pH are available in Fig. S4. Since the α0 domain29 is mostly disordered in solution26,31,32, its backbone average solvent accessibility is higher than for other α-helical domains, as expected. Importantly, α1 displays an intermediate accessibility and the other α-helical domains backbones are generally not exchanging the amide hydrogen with the solvent, in agreement with Fig. 3 data. Regarding loop domains, D0 and L1-2 have their backbone N-H groups mostly interacting with the solvent, while other loops are roughly unable to perform amide hydrogen exchange, in accordance with the analysis of Fig. 3. Therefore, we can obtain a single parameter that describes individual and regional exposure to the solvent, as described hereafter.

Linearity of intensities versus pH

The backbone N-H peak intensities of individual amino acid residues follow a roughly linear decrease with pH (Fig. 4a), which is also observed for the average of the main regions (Fig. 4b) and domains (Fig. S4). As such, an approximation was used by fitting the following empirical linear equation to the data:

$$\frac{{\rm{Int}}}{{{\rm{Int}}}_{\mathrm{pH}6.0}}\approx {\rm{Slope}}\times ({\rm{pH}}-6.0)+1$$ (1)

where Int is the 1H-15N HSQC peak intensity at a given pH and Int pH 6.0 is the average intensity from 3 independent measurements at pH 6.0. The fitting of this equation retrieves the slope, which is a parameter that describes the average value of the derivative d(Int/Int pH 6.0 )/dpH throughout the pH interval probed. A formal approach was also devised based on the literature10,11,12,20,21,22,23,24, which can be found on the Supplementary Note (of the Supplementary Information file), leading to the pH dependencies of both Int/Int pH 6.0 and d(Int/Int pH 6.0 )/dpH. Importantly, this simpler slope-based (linear) approach retrieves a single fitting parameter that entirely describes the trend, being independent of external parameters estimation that sometimes are difficult to determine (i.e., k rc or t values; for details, please consult the Supplementary Note). In practice, the more negative is the slope, the more susceptible to exchange is the corresponding N-H group. Slopes and Int pH 7.5 /Int pH 6.0 values are comparable, as explained hereafter. Since slopes are originated from measurements at several pH values, they are a better parameter to represent each N-H group solvent accessibility, being of use to more advanced applications, and were employed henceforth.

DENV C structure/dynamics and the slope information

Slope values were calculated via equation 1 for each analyzable DENV C backbone N-H group (Fig. 5a). The average slope values of the three major regions and of the secondary structure domains (Fig. S4) were then computed (Fig. 5b). The information obtained is similar to the one derived from the Int pH 7.5 /Int pH 6.0 values (compare Figs. 5a and 2b, and also Figs. 5b and 3). In Fig. 5a, it is easy to distinguish the individual N-H groups that are fully exposed, intermediately exposed or buried away from the solvent. This is also clear in Fig. 5b for the three main regions and the several secondary structure domains. Noteworthy, within a given region, interconnecting loops seem to be more dynamic and exposed than adjacent α-helical domains, in consonance with the protein structure. Slope values of 0 (Fig. 5a) are from N-H groups of residues that cannot change their hydrogen with the solvent (corresponding to Int pH 7.5 /Int pH 6.0 values of 1, in Fig. 2b). Slopes with absolute value higher than 0.7 (Fig. 5a) arise from N-H groups which are performing H-bonds with the water (corresponding to Int pH 7.5 /Int pH 6.0 values of 0, in Fig. 2b). A slope threshold of −0.4 distinguishes the more solvent exposed N-H groups (Fig. 5a, yellow) from those less exposed (Fig. 5a, gray). A detailed analysis of Fig. 5a using this threshold reveals that the most solvent accessible N-H groups are from residues R5 to R22, T25, Q27, T30, S34, G42, L44, I59, G64, S75, K76 and R99. These residues are accordingly depicted in the protein structure (Fig. 5c, matching yellow and gray residues), providing direct information on both IDP and ordered regions of DENV C protein, which are immediately distinguishable. Moreover, many of these residues are located at the beginning of all the protein α-helices, which gives information on protein structure.

Figure 5 Slopes of Int/Int pH 6.0 versus pH in the context of the DENV C structure. (a) Slope of the intensities ratio versus pH along the DENV C sequence (an inverse scale is shown since the more efficient is the hydrogen exchange process, the more negative is the slope). The threshold of −0.4 (dashed line) was defined to identify the DENV C backbone N-H groups that are highly exposed to the solvent (yellow bars). For details on the protein structural information and symbols, on top and within the graph (respectively), please refer to the legend of Fig. 2. (b) Average slopes of the three structure/dynamics regions (b, left panel) and of the secondary structure domains (b, right panel) of DENV C. Error bars in a and b represent SE. (c) DENV C residues in which the backbone N-H is highly exposed to the water were highlighted within the protein structure (yellow regions). Clearly, from a and c, all the residues of the disordered N-terminal region, some specific residues on the flexible fold region and residues in the beginning of the α-helices are able to exchange their backbone amide hydrogen with the water. Full size image

The fact that many of the first α-helical residues have N-H groups exposed to the solvent is a direct insight into the nature of α-helices. In an α-helix, the first residue is establishing H-bond via its C=O group with the N-H group of the fourth residue, leaving its own N-H group free of H-bonds involved in the α-helix stabilization. This means that the N-H groups of the first three residues of α-helices are free to establish H-bonds with other nucleophile groups (that serve as hydrogen bond acceptors) either from the protein, becoming unavailable to the solvent, or from the solvent. If they are exposed to the water, their amide hydrogen can exchange with those from the solvent. This is exactly what we observe in DENV C α-helices, by analyzing the backbone N-H groups performing intramolecular H-bonds within the DENV C structure (Fig. S5a). We then compared the slopes information with the normalized frequency of intramolecular H-bonds per N-H group (Fig. S5b), finding a clear correlation of the slopes with DENV C structure. Interestingly, residues that have low frequency of N-H intramolecular H-bonds (<0.5) and low slope values (between −0.4 and 0) are thus free to perform hydrogen exchange, but are unable to do so. This suggests that they are not facing the solvent because they are buried within the protein. In summary, an N-H group from a specific residue needs to be both free of intramolecular H-bonds and exposed to the solvent in order to exchange its hydrogen with the water. Overall, our findings suggest that the probing of the N-H groups’ solvent accessibility of a protein, via minor pH changes, may be used as an additional structure and dynamics restraint to help on the calculation of protein structures.

Applying the method to GB1 protein

Having established the method applicability with DENV C, we proceeded to test it with the B1 immunoglobulin binding domain of streptococcal protein G (GB1), which contains 56 amino acid residues and a structure of four stranded β-sheets with one long α-helix on top (Fig. 6a), as shown by X-Ray diffraction crystallography as well as by NMR (PDB ID: 2GB1 and 5JXV)35,36,43. GB1 has been extensively studied by different biophysical methods and is one of the smallest stable folded globular domains known. A pH interval of 1.5 was assayed as well, but now changing the pH from 6.5 to 8.0. No major conformational changes were seen (Fig. 6b), only minor local switches (Fig. 6c,d). As the overall protein structure remains highly stable within that pH range37, we went further and tested if the intensities of the 1H-15N HSQC peaks revealed any changes (Fig. 7). As for DENV C (Fig. 2), by directly comparing peak intensities at pH 6.5 and 8.0 on GB1 (Fig. 7a) or the ratio between these intensities (Fig. 7b), the major regions of the protein with exposed backbone amide nitrogen atoms can be readily identified, namely the loops, the outer strands of the four-stranded β-sheet (i.e., β2 and β3) and the beginning of the α-helix, which are free of backbone intramolecular H-bonds and accessible to the solvent.

Figure 6 GB1 structure and 1H-15N HSQC spectra from pH 6.5 to 8.0. (a) GB1 experimental structure. GB1 is a domain of immunoglobulin G binding protein that is negatively charged, with 6 cationic and 10 anionic residues out of 56. It consists of four β-sheets, named β1 to β4 (colored in orange), plus one α-helix, named α1 (colored in pink), connected by short loops (PDB ID 5JXV43). (b) Superimposed GB1 1H-15N HSQC spectra at pH 6.5 (blue), 7.3 (red), 7.6 (green) and 8.0 (black). Major chemical shifts corresponding to large conformational rearrangements are not observed. (c and d) Zoom on spectral regions where all types of responses to the pH variation are observed: peaks that vary in intensity and/or display a small chemical shift perturbation. Most peaks neither vary in intensity nor in chemical shift. UCSF Chimera v1.9 software54 was used for protein structure visualization43. Full size image

Figure 7 GB1 1H-15N HSQC peak intensities at pH 6.5 and pH 8.0, and ratio between peaks. (a) GB1 1H-15N HSQC peak intensities at pH 6.5 (gray bars) and pH 8.0 (black bars), and (b) ratio of HSQC peak intensities between pH 8.0 and pH 6.5 (Int pH 8.0 /Int pH 6.5 ). Error bars represent standard error (SE). The M1 residue is not assigned, while Q2 was not analyzed due to absent resonance, as a result of line broadening. All other amino acid residue peaks were assigned and analyzed. The horizontal line on b marks the ratio equal to 1. The main structural features are indicated on the top of the figure: the secondary structure domains36, and the protein primary sequence. Colored columns are a guide for the data corresponding to each secondary structure domain (orange and pink columns correspond to experimentally determined β-sheets and α-helix, respectively. Full size image

Then, with the above in mind, we tested the use of the slope to map the protein regions most accessible to the solvent (Figs. 8 and S6), using the same cut-off as for DENV C. The information fits well with the known pattern of GB1 solvent-accessible surface area (SASA) along the sequence44, supporting the methodology employed. Moreover, our results are in accordance with H/D exchange studies, as the same regions identified as being more accessible to exchange with the solvent are the most sensitive to pH38. More protected residues that exchange through a global unfolding mechanism (e.g., residues K4, L5, A26, F30 and T44) or a local high energy unfolding mechanism (e.g., residues K28, Y33, N35 and T55) display minimal changes with pH, while the regions that correspond to fast exchanging non H-bonded N-H groups are clearly visible (e.g., residues T17, E19 and V21). This is also supported by the intramolecular H-bonds frequency analysis (Fig. S6), similarly to DENV C (Fig. S5). All this information further validates the methodology employed and suggests its applicability in other studies, as discussed ahead.