The split energy as an aid to finding split sites

Experimental trial and error-based approaches to generate split proteins often are unable to produce optimal splits with low background activity in living cells, and reassembly can be ineffective. A lack of general principles based on the mechanisms of splitting has limited broad applicability of an empirical approach to a wide range of targets. To describe rules identifying potential split sites in target proteins, we first sought to analyze existing proteins which have been split with varying degrees of effectiveness. To rationally select a potential split site on a protein structure, we first developed a physical scoring function, the “split energy”, whose minima and steep slopes identify sites where splitting should be avoided. To compute the split energy for scission at any given residue, we computed the total energies of the split parts relative to the native energy of the intact protein (Fig. 1a). The split energy profile revealed sites that are critical for protein folding, and therefore should not be used as split sites. To test the efficacy of split energy as a predictor of useful split sites, we analyzed 16 proteins with previously reported split sites (Supplementary Table 1). These analyses indicated that the split energy profile of a given protein may show either a single energy well minimum or multiwell minima. Successful split sites avoided the major split energy minima and attempts to split at the energy minima were unsuccessful (Fig. 1b–d). The split energy wells showed individual domains in the multidomain proteins firefly luciferase, IFP and Cre recombinase. In single-domain proteins, which are more challenging to split9, we hypothesized that energetic wells indicated “hidden subdomains” resulting from individual folding cores. Notably, these subdomains (seen in green fluorescent protein (GFP), dihydrofolate reductase (DHFR), lactamase, ubiquitin, hygromycin b phosphotransferase B, the Lyn tyrosine kinase, adenylate cyclase, PTP1B phosphatase, and TEV protease) cannot be identified using sequence-based domain databases or visual inspection of protein structure (Supplementary Figs. 1–13). A striking example is GFP, which has a single domain comprising 11 β-barrels, and an α-helix containing the covalently attached chromophore in the center, making the estimation of potential split sites by simple visual inspection of the structure challenging. The split energy profile surprisingly suggested the presence of two hidden subdomains, indicated by two separate energy wells, not detectable from the structure or contact numbers (Supplementary Fig. 1). We confirmed the presence of this multi-subdomain topology in GFP by discreet molecular dynamics simulations10,11. Heat-resistant residues indicated by local minima in unfolding energy diagrams suggested that the possible folding core residues are mostly located in the region between the N terminus and the loop in residues 128–133. The unfolding curve showed a sharp transition starting after this loop and reaching to the C terminus, suggesting a less stable region compared to the N-lobe, and separation of two hidden subdomains (Supplementary Fig. 1). The contact number does not accurately predict the split sites (Supplementary Fig. 1).

For the majority of benchmark proteins, successful sites were in positions between two split energy wells (Class I, Fig. 1b). In some proteins, split sites had been selected near a minor energy well (e.g., Cre recombinase, adenylate cyclase, GFP, and phosphatase, Fig. 1c, Class II). These were likely selected to reduce spontaneous folding, as was reported for Cre12. One exception was TEV protease, which has a split site at the major core; this split analog produced only 43% of wild-type activity13. One protein (N-anthranilate isomerase) did not show clearly defined cores (Class III). The split sites for this protein were close to the termini, distant from the broad region of low split energy14. In all three protein classes, successful split sites are at surface-exposed, evolutionarily non-conserved loops (Fig. 2b and Supplementary Figs. 1–13). Together, these observations suggested that the split energy can serve as an effective aid in finding split sites.

Fig. 2 Lyn SPELL. a The structure of Lyn with split sites shown in red. b Based on the SPELL algorithm, we selected sites to test, including the promising residues 268 and 279, substantially higher in split energy than the cores at 1 and 2. 393 is a previously reported split site. c Phosphotyrosine blot of cell lysates with Lyn analogs split at N-lobe of the kinase domain. d Phosphotyrosine blot of cell lysates, including Lyn analogs split at C-lobe of the kinase domain. GFP is fused to the C terminus of Lyn to show the expression of full-length Lyn or C-lobe split protein Full size image

The development of SPELL algorithm

We improved our algorithm by including more parameters such as solvent accessibility, sequence conservation, and loop “tightness” (Methods). For convenience, we built an open-public web server (spell.dokhlab.org) to predict split sites in a given protein (Supplementary Figure 14). We consider only surface-exposed loops with low sequence conservation. Our goal is to create an algorithm that maximizes the number of predicted true split sites while keeping the number of false-positive predictions at a minimum. The algorithm that uses split energy predicts approximately three times less potential split sites than does an algorithm solely based on solvent accessibility and sequence conservation (Supplementary Table 2). It is not feasible to directly evaluate whether this difference comes from the ability of the algorithm with the split energy to predict less false positives, because most of the literature does not report the failed split sites. However, we notice that the efficiency of both algorithms toward selecting the previously reported split sites is approximately the same (Supplementary Table 2), indicating the efficiency of split energy-based identification of split sites.

To achieve effective reassembly, we rank the loops for “tightness” (a parameter reflecting the ability of a loop to connecting two interacting structural units15) and by the absolute value of the split energy (Methods). The split sites with higher energy appear higher in the ranking order. This ordering is related to intrinsic limitations of the split energy profile approach. While the split energy correctly produces a first approximation of the change in free energy upon splitting, it does not take into account the entropic change, resulting in a growing number of alternative pathways through which the protein can reassemble. With a split site located deeper, more hydrophobic residues become exposed, resulting in increased spontaneous reassembly and alternative assembly pathways leading to aggregation. The majority of split sites that have been experimentally tested were ranked by our algorithm as hits in the prediction table (Supplementary Table 2). In the other cases, the loops ranked as hits by our algorithm were not previously tested experimentally.

Prevention of spontaneous assembly with engineered FKBP12

While our algorithm can identify split sites where a protein can be successfully reconstituted, it does not eliminate the possibility that the split parts can spontaneously reassemble, or that the split parts cannot effectively reassemble upon induction. Previous studies used the dimerization of the proteins FKBP12 and FRB, driven by the small-molecule rapamycin, to induce reassembly of split proteins; FKBP12 was appended onto one-half protein and FRB onto the other. We previously showed that a version of FKBP12 missing the first 20 residues, denoted insertable FKBP12 or iFKBP, could be inserted into a protein to destabilize it until the iFKBP bound rapamycin16,17,18. For the reassembly of split proteins, we hypothesized that the use of this destabilizing version of FKBP12 could prevent spontaneous assembly, but also enable proper folding upon rapamycin-induced heterodimerization of iFKBP and FRB. Molecular dynamics simulations showed that iFKBP is substantially destabilized compared to FKBP12, which has a melting temperature of ~65 °C19, and the energy of iFKBP increased significantly upon refolding induced by rapamcyin (Supplementary Fig. 15). Moreover, the iFKBP N terminus is less than ~7 Å from the C terminus of FRB in the rapamycin-bound FRB–iFKBP heterodimer (Supplementary Fig. 15), so iFKBP can be readily inserted into “tight” surface loops that connect two interacting structural units. This feature provides an ability to effectively destabilize the split protein to reduce spontaneous assembly, reducing background activity, but also enables effective reassembly as the FRB terminus attached to the other split half is in close proximity to the iFKBP terminus (Supplementary Fig. 15). We tested the ability of iFKBP to prevent the spontaneous reassembly of a frequently used split protein, split-TEV13. To evaluate the activity of TEV analogs in living cells, we built a Förster resonance energy transfer (FRET) biosensor that produces a lower FRET signal in the presence of TEV activity (Supplementary Fig. 16). Split-TEV made using iFKBP rather than FKBP showed significantly lower activity in the absence of rapamycin. These experiments demonstrated that iFKBP can reduce spontaneous reassembly and led to an optimized split-TEV that we named TEV SPELL to denote use of the modified FKBP for optimized splitting and reassembly.

Designing Lyn SPELL

Successful prediction of split sites in previously published split proteins suggested that we might predict split sites in new protein targets. We produced SPELL protein analogs of diverse proteins, starting with tyrosine kinase Lyn. A split analog of Lyn kinase was previously produced using the empirically identified split sites 393–39420. The split energy diagram of Lyn showed one major energy well (core) with two other shallow wells. Residue 393 was located in the vicinity of the core in split energy profile, so we selected residues 268 and 279, which are surface exposed and not evolutionary conserved (Fig. 2a, b). We hypothesized that this selection would make the split protein halves more stable, yet iFKBP alone would still eliminate background activity. Strikingly, in contrast to in vitro activity20, the published 393 site produced almost no activity in living cells upon rapamycin addition (Fig. 2c). The analogs split at 268, 279 (named Lyn SPELL), 458, and 466 had substantial activation upon rapamycin addition (Fig. 2c, d). More importantly, we selected three other non-conserved surface-exposed sites (312, 377, and 419), which are not favorable in the split energy profile, as they are in the wells. The analogs split at these sites did not produce any activity upon rapamycin addition. Furthermore, we tested the split site 484, which is both surface exposed and favorable in the split energy diagram, but is evolutionary conserved. This analog did not produce any activity in the absence or presence of rapamycin, supporting the importance of analyzing sequence conservation (Fig. 2d). In total, we successfully identify split sites that are predicted to be the top five sites by our algorithm.

Designing GDI1 SPELL

We next applied SPELL to guanosine nucleotide dissociation inhibitor 1 (GDI1), a Rho GTPase family regulator which had not previously been targeted. The split energy profile indicated one major well and three shallow energy wells (Fig. 3a, b), similar to the profile of the Class III phosphatase (Fig. 1c). The first two top-ranked predictions (residues 59 and 66) are located in the same loop, so we chose to simply test residue 66. We also selected residue 84, which has an SSA of 27 Å, below the threshold defined in the algorithm. We expected that GDI split at this site would not produce activity upon reassembly (Fig. 3b). To test the efficacy of these split sites, we tested the ability of different split GDI analogs to inhibit activation of a Rac1 FRET biosensor (Rac1 FLARE DC1g21) in living cells. The split analog of GDI generated using residue 66 (GDI-66 SPELL) was fully activated with rapamycin, whereas the analog split at residue 84 did not provide full activity upon reassembly, as predicted (Fig. 3c).

Fig. 3 GDI SPELL. a The structure of RhoGDI bound to the GTPase Cdc42. b The SPELL algorithm indicated residue 66 as a split site. c The inhibitory activity of GDI SPELL (split at 66) was activated by rapamycin, whereas another design (split at 84) split at a small well did not display full activity with rapamycin. Error bars represent ± s.e.m. (n = 3) from three independent cell populations Full size image

Designing Vav2 SPELL

We next targeted the catalytic domain of the Rho family guanine exchange factor (GEF) Vav2, a challenging target to split as it has a single subdomain architecture, indicated by the split energy profile (Fig. 4a, b). We tested residue 347 (L4), the top prediction of our algorithm. The C-lobe fused to FRB was expressed together with either the N-lobe alone or the N-lobe fused to FKBP12. When these constructs were tested using the Rac1 biosensor, the N-lobe produced significant background activity indicating spontaneous reassembly (Fig. 4c). In contrast, no background activity was detected when iFKBP was used, further demonstrating the ability of iFKBP to destabilize the protein and prevent reassembly in the absence of rapamycin. Addition of rapamycin led to rapid and robust activation of these Vav2 SPELL analogs (Fig. 4d). Vav2 SPELL was also tested by monitoring its effects on the fluorescence spectra of cells expressing the Rac1 biosensor; the FRET emission intensity was increased upon rapamycin addition (Supplementary Fig. 17). We similarly generated a SPELL analog of intersectin-1 (ITSN), a Cdc42 GEF, with a split site at E1398, the top prediction of our algorithm. Using a Cdc42 biosensor (Cdc42 FLARE DC1g21), we found that ITSN SPELL was rapidly activated by rapamycin (Supplementary Fig. 18).

Fig. 4 Activation of Vav2 SPELL leads to cell protrusion. a Split energy profile of the DH domain of Vav2. Green arrow shows the least destabilized region and chosen loop for splitting. cons = sequence conservation, saa = surface exposure. b A structural model of Vav2 SPELL. iFKBP (light gray) was fused to the C terminus of the N-lobe of the DH domain (green), and FRB (dark gray) was fused to the N terminus of the C-lobe of the DH domain (blue) in the presence of rapamycin (purple). c A dual chain Rac1 FRET sensor was used to test Vav2 analogs. The FRET ratio (reflecting the activity of Rac1) with respect to the amount of mCherry-labeled Vav2 proteins: spVav-FKBP12-FRB = split protein generated using FKBP rather than iFKBP (mCherry-DHN-FKBP12 and FRB-DHC-PH-ZnF), Vav2 SPELL (mCherry-DHN-iFKBP and FRB- DHC-PH-ZnF), spVav-FRB = split protein made with no FKBP (mCherry-DHN and FRB- DHC-PH-ZnF), and Vav2 (mCherry-DH-PH-ZnF). d Vav2 SPELL activated with rapamycin or caged rapamycin, assayed as in c. e A HeLa cell expressing Vav2 SPELL, showing protrusion (green) and retraction (red) 19 min after rapamycin addition (upper left). f, g, h Morphology parameters (area, protrusive activity, and polarization index) of cells expressing Vav2 SPELL (green, mean ± s.e.m., n = 19 cells) vs. cells expressing only membrane marker (black, mean ± s.e.m., n = 36 cells). Rapamycin was added at 30 min (red line) Full size image

We showed previously that the iFKBP–FRB interaction can be induced by light using a rapamycin whose activity is blocked by a photocleavable protecting group (caged rapamycin, or pRap15). Using pRap, Vav2 SPELL was fully activated within 1 min of irradiation using 365 nm light (Fig. 4d). Rapamycin-mediated activation of Vav2 SPELL in HeLa cells induced protrusions within minutes, as reflected in both increased area and spreading (Fig. 4e–h and Supplementary Movie 1). Cells reached maximum protrusive activity within 10 min. This observation was consistent with Vav2′s known role in activating Rac1 to induce motility4,22.