Significance A major goal of protein design is to create proteins that have high stability and biological activity. Drawing on evolutionary information encoded within extant protein sequences, consensus sequence design has produced several successes in achieving this goal. Here, we explore the generality with which consensus design can be used to enhance protein stability and maintain biological activity. By designing and characterizing consensus sequences for six unrelated protein families, we find that consensus design shows high success rates in creating well-folded, hyperstable proteins that retain biological activities. Remarkably, many of these consensus proteins show higher stabilities than naturally occurring sequences of their respective protein families. Our study highlights the utility of consensus sequence design and informs the mechanisms by which it works.

Abstract Consensus sequence design offers a promising strategy for designing proteins of high stability while retaining biological activity since it draws upon an evolutionary history in which residues important for both stability and function are likely to be conserved. Although there have been several reports of successful consensus design of individual targets, it is unclear from these anecdotal studies how often this approach succeeds and how often it fails. Here, we attempt to assess generality by designing consensus sequences for a set of six protein families with a range of chain lengths, structures, and activities. We characterize the resulting consensus proteins for stability, structure, and biological activities in an unbiased way. We find that all six consensus proteins adopt cooperatively folded structures in solution. Strikingly, four of six of these consensus proteins show increased thermodynamic stability over naturally occurring homologs. Each consensus protein tested for function maintained at least partial biological activity. Although peptide binding affinity by a consensus-designed SH3 is rather low, K m values for consensus enzymes are similar to values from extant homologs. Although consensus enzymes are slower than extant homologs at low temperature, they are faster than some thermophilic enzymes at high temperature. An analysis of sequence properties shows consensus proteins to be enriched in charged residues, and rarified in uncharged polar residues. Sequence differences between consensus and extant homologs are predominantly located at weakly conserved surface residues, highlighting the importance of these residues in the success of the consensus strategy.

Exploiting the fundamental roles of proteins in biological signaling, catalysis, and mechanics, protein design offers a promising route to create and optimize biomolecules for medical, industrial, and biotechnological purposes (1⇓–3). Many different strategies have been applied to designing proteins, including physics-based (4), structure-based (5, 6), and directed evolution-based approaches (7). While these strategies have generated proteins with high stability, implementation of the design strategies is often complex and success rates can be low (8⇓⇓–11). Although directed evolution can be functionally directed, de novo design strategies typically focus primarily on structure. Introducing specific activity into de novo-designed proteins is a significant challenge (8, 10).

Another strategy that has shown success in increasing thermodynamic stability of natural protein folds is consensus sequence design (12). For this design strategy, “consensus” residues are identified as the residues with the highest frequency at individual positions in a multiple sequence alignment (MSA) of extant sequences from a given protein family. The consensus strategy draws upon the hundreds of millions of years of sequence evolution by random mutation and natural selection that is encoded within the extant sequence distribution, with the idea that the relative frequencies of residues at a given position reflect the “relative importance” of each residue at that position for some biological attribute. As long as the importance at each position is largely independent of residues at other positions, a consensus residue at a given position should optimize stability, activity, and/or other properties that allow the protein to function in its biological context and ultimately contribute to organismal fitness.* By averaging over many sequences that share similar structure and function, the consensus design approach has the potential to produce proteins with high levels of thermodynamic stability and biological activity, since both attributes are likely to lead to residue conservation.

There are two experimental approaches that have been used to examine the effectiveness of consensus information in protein design: point-substitution and “wholesale” substitution. In the first approach, single residues in a well-behaved protein that differ from the consensus are substituted with the consensus residue (13⇓⇓⇓–17). In these studies, about one-half of the consensus point substitutions examined are stabilizing, but the other half are destabilizing. Although this frequency of stabilizing mutations is significantly higher than an estimated frequency around 1 in 103 for random mutations (13, 18, 19), it suggests that combining individual consensus mutations may give minimal net increase in stability since stabilizing substitutions would be offset by destabilizing substitutions.

The “wholesale” approach does just this, combining all substitutions toward consensus into a single consensus polypeptide composed of the most frequent amino acid at each position in sequence. By making a large number of substitutions at once, wholesale consensus substitution may collectively combine the incremental effects from the individual substitutions as well as nonadditive effects arising from the substitution of each residue into the novel background of the consensus protein (20, 21). The stabilities of several globular proteins and several repeat proteins have been increased using this approach (22⇓⇓⇓⇓⇓⇓⇓⇓–31). An increase in thermodynamic stability is seen in most (but not all) cases, but effects on biological activity are variable. In a recent study, we characterized a consensus-designed homeodomain sequence that showed a large increase in both thermodynamic stability and DNA-binding affinity (32). Unlike the point-substitution approach, where both stabilizing and destabilizing substitutions are reported, the success rate of the wholesale approach is not easy to determine from the literature, where publications present single cases of success, whereas failures are not likely to be published. One study of TIM barrels reported a few poorly behaved consensus designs that were then optimized to generate a folded, active protein (24). Although this study highlighted some limitations, it would not likely have been published if it did not end with success.

Here, we address these issues by applying the consensus sequence design strategy to a set of six taxonomically diverse protein families with different folds and functions (Fig. 1). We chose the three single domain families including the N-terminal domain of ribosomal protein L9 (NTL9), the SH3 domain, and the SH2 domain, and the three multidomain protein families including dihydrofolate reductase (DHFR), adenylate kinase (AK), and phosphoglycerate kinase (PGK). We characterized these six consensus proteins in terms of structure, stability, and function. We find that consensus sequences for all six protein families are quite soluble and adopt the native folds of their respective families. Strikingly, four of the six consensus proteins show increased thermodynamic stability compared with naturally occurring homologs; the other two consensus proteins show stabilities comparable to natural homologs. All consensus proteins assayed for biological activity retain their expected activities, including molecular recognition and enzymatic catalysis. An advantage of this multitarget comparison is that it allows us to examine sequence features of consensus-designed proteins and relate them to one another and to naturally occurring homologs. This sequence analysis shows that consensus proteins are enriched in charged residues and are depleted in polar uncharged residues, and highlights the importance of weakly conserved surface residues in enhancing stability through the consensus design strategy.

Fig. 1. Targets for consensus design. A representative structure of an extant sequence is shown for each target family (NTL9, 2HBB; SH3, 1LKK; SH2, 4U1P; DHFR, 5DFR; AK, 1ANK; PGK, 1PHP). Length of consensus sequence, biological function, and number of sequences used in final MSA are noted.

Discussion There are a number of reports in the literature in which consensus design has successfully been used to produce proteins that adopt their archetypical fold. This approach has found recent success for linear repeat-protein targets (49⇓⇓–52) but has also shown some success for globular protein targets (13, 14, 23, 26, 27). In most reports, globular consensus proteins have enhanced equilibrium stability and sometimes retain biological activity. However, point-substitution studies suggest stability gains from wholesale consensus substitution may be marginal, and protein engineering studies with a WW domain suggest that consensus proteins may lack the coupling energies needed to fold (53). Because publications of successful design of globular proteins have so far been “one target at a time,” and since unsuccessful designs are unlikely to be published, it is difficult to estimate how likely the consensus approach is to yield folded proteins, and to determine how to improve on the approach when it performs poorly. The studies presented here, where we generate and characterize consensus sequences for six protein families, provide an unbiased look at rates of success and failure. Comparisons for different targets are aided by the use of a common design protocol for all targets, a common purification pipeline, and a uniform set of protocols for measuring stability, structure, and function. This multitarget approach also allows us to evaluate sequence features that are likely to contribute consensus stabilization in general way, and correlate these features with measured biochemical properties. Our results show that consensus sequence design is both a general and successful strategy for small and large globular proteins with diverse folds. For all six protein families targeted, the resulting consensus proteins expressed to high levels, remained soluble in solution, and adopted a well-folded tertiary structure. Structural information obtained from CD and NMR spectroscopies suggest that all of the consensus proteins adopt their archetypal fold. Support for folding is provided by the observation that all consensus proteins tested show their expected biological activities (Figs. 2, 4, and 5). Our findings indicate that thermodynamic stabilization of consensus proteins is the norm, rather than the exception. Four of the six [and five of the seven including the previously reported homeodomain (32)] consensus proteins showed stabilities greater than the most stable naturally occurring sequences identified in our literature search, and the other two showed stabilities near and above average (Fig. 3 and SI Appendix, Table S3). In short, consensus design can be expected to provide stability enhancements about three-quarters of the time. This result is particularly surprising, given that proteins are not under direct evolutionary selection for maximal stability (54, 55), but must simply retain sufficient stability to remain folded. It appears that by taking the most probable residue at each position, a large number of small stability enhancements sum to a large net increase in stability. Although naturally occurring extant proteins are of lower stability (perhaps because high stabilities are not required for function), this information gets encoded in alignments of large numbers of sequences. Taken at face value, these results highlight the importance of information encoded at the level of single residues. Though single-residue information does not explicitly include pairwise sequence correlations, favorable pairs may be retained in our consensus designs as long as each residue in the pair is highly conserved. Thus, the extent to which consensus sequences capture such “accidental” correlations and their contributions to the observed effects on stability require further studies. The source of the relative instability of cSH3 is unclear. The consensus SH3 and the extant sequences from which it was generated have some exceptional features, although none of these is unique to SH3. The SH3 sequences are among the shortest sequence families examined (54 residues), although not the shortest (NTL9 is 46 residues). The consensus SH3 sequences has a high fraction of polar residues and the lowest fraction of uncharged residues (Fig. 6A), although the cHD is a close second. The SH3 sequence is also the only protein in our set that has an all-β structure. Taxonomically, sequences in the SH3 MSA show the lowest pairwise identity (26%; SI Appendix, Table S1), although the SH2 family is a close second (28%). It may be noteworthy that the increase in stability of cSH2 compared with the available extant values is the smallest stability increment observed for those proteins likely to conform to a two-state mechanism. It may also be noteworthy that, along with SH2, the SH3 multiple sequence is uniquely dominated by eukaryotic sequences. In a study of consensus superoxide dismutase sequences, Goyal and Magliery (56) found successful consensus design to be highly dependent on the phylogeny represented in the MSA. A particular advantage of consensus sequence design is that it draws upon the natural evolutionary history of a protein family. As a result, residues important for function are likely to be conserved (57). However, it cannot be assumed a priori that the resulting consensus proteins will show biological function, since consensus sequences are novel sequences that have experienced no evolutionary selection for function. Importantly, all consensus proteins we assayed for function maintained some level of expected biological activities of both molecular recognition and enzymatic catalysis (Figs. 4 and 5). This result, combined with previously reported studies showing consensus protein function (22⇓⇓⇓⇓⇓⇓⇓⇓–31), indicates that information necessary for protein function is retained in averaging over many sequences that each individually contain functionally important information. In both this study and our previous investigations of a cHD (32), consensus substitution showed varying effects on molecular recognition. Consensus HD and cSH3 each showed two to three orders of magnitude differences in their binding affinities to cognate substrates relative to naturally occurring sequences, with cHD showing higher affinity and cSH3 showing lower affinity. The origins of these differences in substrate binding affinities remain unclear. It is possible that the sequences used to obtain the cHD sequence bind similar sequences (indeed, many of these sequences are from the engrailed superfamily), resulting in an “optimized” homeodomain, whereas sequences used to obtain the consensus SH3 possess different specificities, resulting in a sequence whose binding affinity has been “averaged out.” Testing this explanation will require an investigation of the binding specificities of the consensus proteins as well as those of the sequences used to generate them. For the three enzymes examined here, consensus substitution shows variable effects on catalysis. At low temperatures (20 °C), steady-state turnover numbers for all consensus enzymes were smaller than those for naturally occurring mesophilic sequences, but on par with those of thermophilic homologs (Table 2). This is consistent with the observation that thermophilic proteins show lower catalytic activities at low temperatures than their less stable mesophilic counterparts (58, 59). At higher temperatures, cPGK has a k cat value comparable to thermophilic homologs, whereas cDHFR and cAK have higher and lower k cat values, respectively. On the whole, this observation demonstrates that consensus enzymes can (but sometimes do not) achieve the same level of activities as their naturally occurring counterparts. The observed inverse correlations between enzyme stability and catalytic rates have widely been interpreted as resulting from a trade-off between dynamics and catalysis (60). As this issue is still debated (61, 62), consensus sequence design may offer a promising avenue to gain insights into the relationships among protein stability, dynamics, activity, and evolution. The consensus design strategy used here appears to impart a strong bias on sequence: Taking the most probable residue at each position does not result in average composition. This bias may have significant effects on stability and function. The consensus sequences all have a high content of charged residues and low content of polar uncharged residues (Fig. 6). Consensus substitutions from uncharged to charged residues show a stronger bias toward positions of higher sequence entropy than substitutions among uncharged residues (SI Appendix, Fig. S15). Thus, the overall enrichment of charged residues in consensus sequences results from charged residues (E, D, and K) “winning” over uncharged residues at positions with low conservation. Similar (but not identical) compositional biases have also been observed in thermophilic sequences, consistent with the high stabilities observed for the consensus proteins. Like our consensus sequences, thermophilic proteins have been shown to be enriched in E and K, and depleted in A, C, H, Q, S, and T (63, 64). However, thermophilic proteins have also been shown to be enriched in Y, R, and I, which are at or below average composition in our consensus sequences. Although it might be expected that the inclusion of sequences from thermophilic organisms in our MSAs contributes both to the composition bias and to high stabilities, most of the sequences in our MSAs are from mesophilic organisms. The sequences in the SH3 and SH2 MSAs are predominantly eukaryotic (as were our previous cHD sequences); aside from a small number of moderately thermophilic fungi, these sequences all derive from mesophiles. For the other four protein families (NTL9, DHFR, AK, and PGK), MSAs are composed of at most 5% of sequences from thermophilic or hyperthermophilic bacteria or archaea (SI Appendix, Table S5). If the identified thermophilic sequences are removed from the MSA before consensus sequence generation, the resulting consensus sequences have identities of 98.6% or greater to the consensus sequence derived from the full MSAs (SI Appendix, Table S5).‡ Makhatadze et al. (47, 48) have been able to increase stability by introducing charged residues at surface positions of several proteins and optimizing electrostatic interactions. It is unclear to what extent the locations of consensus charged residues optimize electrostatic interactions, or whether additional stability increases can be obtained by charge shuffling. Increases in stability and solubility have also been reported for “supercharged” proteins, which have similar numbers of charged residues (65); however, the consensus proteins studied here are generally close electroneutrality, whereas supercharged proteins have highly imbalanced positive or negative charge. It should be noted, however, that some of the consensus sequences deviate from the general trends observed in sequence biases. For instance, consensus SH3 and HD have a greater percentage of polar residues and a lower percentage of nonpolar residues, and cDHFR has a much lower net charge, compared with the average for each MSA. Thus, consensus sequence statistics appear to abide by general trends but not absolute rules. Similarly, analysis of the positions at which the consensus sequences differ from extant sequences highlights important aspects about the consensus design strategy. These consensus mismatches occur mainly at positions with relatively low conservation and positions on the protein surface (Fig. 6), consistent with the well-known correlation between residue conservation and solvent-accessible surface area (66). This may highlight an implicit advantage of consensus sequence design, since substitutions at core positions are often destabilizing (67). However, the large observed effects of consensus substitution on both stability and activity indicate that these weakly conserved and surface positions play a sizable role in both stability and function, and considerable gains can be made by optimizing these positions. This observation is consistent with the observation of the functional impacts of nonconserved “rheostat” substitutions on the surface of lac repressor (68). Furthermore, the importance of these weakly conserved positions suggests that using a large number of sequences may be a key component of successful consensus design, since weakly conserved positions are most sensitive to phylogenetic noise and misalignment (69). Our work here demonstrates that the consensus sequence design method is both a general and successful strategy to design proteins of high stability that retain biological activity. Compared with other rational, structure-based, or directed evolution methods, consensus sequence design provides a simple route to accomplish longstanding goals of protein design. Furthermore, its foundation in phylogenetics provides a promising avenue to address questions regarding the relationships of protein sequence, biophysics, and evolution.

Materials and Methods Design of Consensus Sequences. Sequences for each family were gathered from Pfam (33), SMART (34), or InterPro (35) databases. Resulting sequence sets for each domain were filtered by sequence length, removing sequences 30% longer or shorter than the median sequence length of the set. To avoid bias from sequence groups with high identity, we used CD-HIT (70) to cluster sequences at 90% identity and selected a single representative sequence from each cluster. This curated sequence set was used to generate an MSA using MAFFT (71). At each position of the MSA, frequencies were determined for the 20 amino acids along with a gap frequency using an in-house script. Positions occupied by residues (as opposed to a gap) in at least half of the sequences were included as positions in the consensus sequence. The most frequent residue at each of these “consensus positions” was gathered to create the consensus sequence for that protein family. NMR Spectroscopy. 15N- and 13C,15N-isotopically labeled proteins were expressed and purified as described in SI Appendix. NMR samples, data acquisition, and data analysis are also described in SI Appendix. Peptide binding to cSH3 was monitored by heteronuclear NMR spectroscopy. A putative SH3-binding peptide (Ac-PLPPLPRRALSVW-NH 2 ) was synthesized by GenScript. Samples containing 200 μM 15N-labeled cSH3 were prepared at 0-, 0.05-, 0.125-, 0.5-, 1.25-, 2.5-, 5-, 10-, and 20-fold molar equivalents of unlabeled peptide. 1H–15N HSQC spectra were collected on a Bruker Avance 600-MHz spectrometer in 150 mM NaCl, and 5% D 2 O, 25 mM NaPO 4 (pH 7.0) at 25 °C. 1H–15N chemical shifts varied monotonically with peptide concentrations such that assignments from the apo-protein could be transferred to the bound state. CSPs were calculated and globally fit using a single-site binding equation as described in SI Appendix. CD and Fluorescence Spectroscopies. CD measurements were collected on an Aviv Model 435 CD spectropolarimeter. Far-UV CD spectra were collected using a 1-mm cuvette with protein concentrations ranging from 2 to 31 μM at 20 °C, averaging for 5 s with a 1-nm step size. Consensus NTL9, cSH3, cSH2, and cDHFR were collected in 150 mM NaCl and 25 mM NaPO 4 (pH 7.0). Consensus AK and cPGK were collected in 50 mM NaCl, 0.5 mM TCEP, and 25 mM Tris⋅HCl (pH 8.0). GdnHCl- and temperature-induced folding/unfolding transitions were monitored using either CD or fluorescence spectroscopy. All unfolding transitions were collected with protein concentrations ranging from 1 to 6 μM. GdnHCl melts were collected at 20 °C. UltraPure GdnHCl was purchased from Invitrogen. Concentrations of GdnHCl were verified using refractometry (72). Temperature-induced unfolding transitions were generated by measuring CD or fluorescence in 2 °C increments. Samples were allowed to equilibrate for 2 min at each temperature before signal measurement. The signal at each temperature was then averaged for 30 s. Reversibility was assessed by cooling samples to 25 °C after thermal denaturation and comparing CD or fluorescence emission spectra to those collected immediately before thermal unfolding. GdnHCl-induced unfolding of cDHFR was monitored by CD at 222 nm, signal averaging for 30 s at each GdnHCl concentration. Unfolding of cNTL9, cSH3, and cSH2 was monitored by tryptophan fluorescence on an Aviv Model 107 ATF. Fluorescence was measured using a 280-nm excitation and either a 332-nm (cSH2 and cSH3) or 348-nm (cNTL9) emission, signal averaging for 30 s at each GdnHCl concentration. For these four proteins, unfolding was found to equilibrate rapidly, so that titrations could be generated using a Hamilton automated titrator, with a 5-min equilibration period. Gdn-induced unfolding of cAK and cPGK was found to equilibrate on a slower timescale than the other consensus proteins, prohibiting the use of an automated titrator. Therefore, samples at each denaturant concentration were made individually and equilibrated at room temperature for 24 h (cAK) or 5 h (cPGK). For each sample, the CD signal at 225 nm (cAK) or 222 nm (cPGK) was averaged for 30 s. Melts for both proteins were collected in buffer containing 50 mM NaCl, 0.5 mM TCEP, and 25 mM Tris⋅HCl (pH 8.0). Titrations were carried out in triplicate for each protein. Thermodynamic folding/unfolding parameters were determined by fitting a two-state linear extrapolation model to the folding/unfolding curves (73). Steady-State Enzyme Kinetics. Steady-state enzyme kinetic parameters at 20 °C were determined for cDHFR (in the direction of tetrahydrofolate formation), cAK (in the direction of ADP formation), and cPGK (in the direction of 1,3-bisphosphoglycerate formation) using absorbance spectroscopy to monitor the oxidation of NAD(P)H catalyzed either directly by the consensus enzyme (cDHFR) (74), or by the activity of an enzyme that is directly coupled to the products produced by the rate-limiting activity of the consensus enzyme (cAK and cPGK) (59, 75). The absorbance at 340 nm was monitored over time after rapid addition of consensus enzyme. Steady-state velocities were determined as the initial linear slope of the time course. Because these enzymes catalyze bisubstrate reactions, we were able to obtain Michaelis–Menten kinetic parameters for each substrate by varying the concentration of one substrate at a constant, saturating concentration of the other substrate. For cDHFR assays, a concentration of 175 nM cDHFR was used in reaction buffer containing 25 mM Hepes (pH 7.5) and 150 mM NaCl. For cAK assays, a concentration of 53 nM cAK was used in reaction buffer containing 50 mM Hepes (pH 7.5), 100 mM NaCl, 20 mM MgCl 2 , 1 mM phosphoenolpyruvate, 0.1 mM NADH, 10 units of pyruvate kinase, and 10 units of lactate dehydrogenase. For cPGK assays, a concentration of 53 nM consensus PGK was used in reaction mixture containing 100 mM Tris⋅HCl (pH 8.0), 3 mM MgCl 2 , 0.1 mM NADH, and 5 units of glyceraldehyde phosphate dehydrogenase. For cDHFR and cPGK, k cat values were measured at high temperatures using the absorbance spectroscopic assays described above at saturating concentrations of both substrates. Samples were allowed to equilibrate at the desired temperature for 5 min before initiation of the reaction. Consensus DHFR activity was measured up to 50 °C. Consensus PGK activity was measured up to 40 °C, the temperature of onset of denaturation of the coupling enzyme (glyceraldehyde phosphate dehydrogenase). For cAK, enzyme activity at various temperatures was measured using a direct 31P NMR assay previously used for an AK from Aquifex aeolicus (SI Appendix, Supplementary Methods) (43). Consensus AK activity was measured up to 70 °C.

Acknowledgments We thank Ananya Majumdar for assistance in collecting and discussions of the NMR experiments. We thank the Johns Hopkins University Biomolecular NMR Center, Center for Molecular Biophysics, and Chemistry NMR Core Facility for providing facilities and resources. We thank Michael Harms for suggesting the BacDive database as a resource to classify bacterial growth temperatures. This work was supported by NIH Grant GM068462 to D.B. M.S. was supported by NIH Grants T32GM008403 and F31GM128295.

Footnotes Author contributions: M.S., K.W.T., and D.B. designed research; M.S. and K.W.T. performed research; M.S. and K.W.T. analyzed data; and M.S. and D.B. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

↵*These properties include rates of folding, unfolding, degradation, compartmentalization, oligomerization, and solubility.

↵ † Although for most families this search resulted in five or more free energy values, providing a good representation of the average stability, all of these stabilities may be biased by experimental constraints (expression, solubility, and baseline-resolved folding transitions).

↵ ‡ Though the 5% estimate of thermophilic sequences is likely to undercount the number of thermophilic sequences in our MSA, since not all sequences could be unambiguously assigned to a source organism, the total number of thermophiles in our MSAs is not likely to exceed 16% (the ratio of identified thermophiles to thermophiles plus mesophiles).

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1816707116/-/DCSupplemental.