Significance Twenty years after a discovery of knotted proteins, we found that some single-protein chains can form links, which have even more complex structures than knots. We derive conditions that proteins need to meet to form links. We search through the entire Protein Data Bank and identify several chains that form a Hopf link and a Solomon link. The link motif has not been recognized before; however, it is clearly of important functional significance in proteins. In this article, we relate topological properties of proteins with links to their function and stability and show that the link topology is characteristic of eukaryotes only.

Abstract Twenty years after their discovery, knots in proteins are now quite well understood. They are believed to be functionally advantageous and provide extra stability to protein chains. In this work, we go one step further and search for links—entangled structures, more complex than knots, which consist of several components. We derive conditions that proteins need to meet to be able to form links. We search through the entire Protein Data Bank and identify several sequentially nonhomologous chains that form a Hopf link and a Solomon link. We relate topological properties of these proteins to their function and stability and show that the link topology is characteristic of eukaryotes only. We also explain how the presence of links affects the folding pathways of proteins. Finally, we define necessary conditions to form Borromean rings in proteins and show that no structure in the Protein Data Bank forms a link of this type.

Knotted proteins have been identified in all kingdoms of life, in organisms separated even by 1 billion years of evolution (1⇓⇓–5). High conservation (5) of knotted motifs and their location (usually) in enzymatic active sites indicates that knots are crucial for protein function. Over 1,300 knotted or slipknotted (shoelace-type) structures, including the trefoil ( 3 1 ), figure-eight ( 4 1 ), three-twist ( 5 2 ), and Stevedore’s ( 6 1 ) knots (4, 6, 7), have been deposited in the Protein Data Bank (PDB) to date according to KnotProt (8).

Mathematically, a knot is defined as an embedding of a circle into a 3D space. A link is a generalization of a knot, defined as an embedding of a finite set of circles. The simplest examples of links are, e.g., the Hopf link and the Solomon link (Fig. 1, Center). Links have been found in DNA (9, 10) and have been synthesized in template synthesis (11, 12). In proteins, the first attempts to identify links were made by Mislow (13, 14). In his approach, however, the link-forming loops were defined either by including interaction with a metal ion (noncovalent loop) or by at least two disulfide bonds for each (covalent) loop. The links formed by covalent loops, each closed by one disulfide bridge only, were considered “unlikely to lead to knots or links” (ref. 14, p. 4,202) by Mislow and therefore hardly examined. Moreover, all of the structures were scanned only by “visual examination of their 3D structures” (ref. 14, p. 4,202). To date, the only known simple protein links are designed p53 protein catenanes (15) (with the backbones of both chains artificially closed, forming linked loops) and a thermophilic two-chain complex (16) (with linked loops formed by the backbones closed via disulfide bridges). However, the discovery of a wide class of complex lasso proteins (17, 18), in which a chain pierces a covalently closed loop (Fig. 1), opens a unique possibility of defining and identifying links. Such links (or more formally, pretzelanes) are defined using covalent loops closed by disulfide bridges in a single-protein chain (compare Fig. 1). Therefore, 20 years after the discovery of knotted proteins, it is time to reformulate Mansfield’s (1) question and ask, Are there links in proteins?

Fig. 1. Exemplary structures of proteins with links: negative Hopf link, positive Hopf link, and positive Solomon link [Protein Data Bank (PDB) codes 2LFK, 2KQA, and 4ASL, Top to Bottom]. Middle row shows the most general link type (without orientation). The orange stripes denote disulfide bridges. The N and C letters denote the N and C termini of the protein. The arrows denote the orientation from the N to the C terminus. For details in link orientation see SI Appendix. In each panel, colors in the structure match the colors in the scheme at Right; the protein topology is presented as a solid (black or colored) line.

In this paper, we propose a general method to identify and classify links in proteins and discuss their biological role. The existence of proteins with stable links changes our view on the complexity of proteins and leads to many intriguing questions never asked before: Are links conserved evolutionarily to provide unique features of proteins? Do they exist in all kingdoms? How do they fold? In this paper, we answer these questions, and, in addition, we find relations between proteins with links based on comparing their evolutionary, sequential, and functional properties.

Search for Links To identify stable links in proteins, we analyzed their structure and used the method of spanning the (triangulated) minimal surface (17, 18). A segment of a protein chain forms a covalent loop if the ends of the segment are connected by a covalent bond (e.g., disulfide bridges). Such a covalent loop can be pierced by a protein tail, thereby forming a complex lasso structure (17) (Fig. 2). A link is formed when the piercing tail is itself a part of another covalent loop. To identify links and their types, we analyze sequential numbers (indexes) of loop-forming cysteines and piercing residues (Fig. 2). The indexes of the cysteines are known from the protein structure, whereas indexes of loop-piercing residues can be determined from minimal surface analysis used in the classification of lasso proteins (17, 18). This method is general and can be applied to various intramolecular contacts. Fig. 2. The method of identification of links. On each covalent loop a triangulated minimal surface is spanned (Left and Right). The necessary condition for the existence of a link is that surfaces pierce one another (Center) or, equivalently, each surface is pierced by the border of the second surface. Using the above method, we performed a comprehensive survey of the more than 115,000 chains deposited in the PDB as of May 2016, taking into account all known covalent interactions (e.g., cysteine, amide, ester, thioester, or carbon–carbon bonds). We found that links are formed in as many as 159 structures, of which 129 form the Hopf link and 35 form the Solomon link. The classification of these proteins according to their topological complexity, sequence similarity, and biological function is shown in Datasets S1 and S2, and exemplary linked protein are shown in Fig. 1. In what follows we discuss conclusions that follow from this review.

Conservation of Links and Artificial Structures To investigate the structural importance of links in proteins, we analyzed their conservation in clusters of 30% sequential homology. We found that links are strictly conserved for all homologs (representative structures are presented in SI Appendix, Figs. S1–S3). The nonconservation of topology in a homology cluster can be therefore viewed as a trace of a structure failure. Indeed, proteins with nonconserved topologies possess a large gap in the structure (nine cases; SI Appendix, section S9), or linked loops are probably an artifact of the EM experiment (highly mobile loops in Envelope glycoprotein gp120, PDB code 3J70). Those structures were excluded from further analysis. On the other hand, the nonconservation of topology can stem from humanly introduced mutations, as for example in the glutamate receptor 2 (PDB code 3T93), in which two loop-closing cysteines were introduced, which results in the formation of a Hopf link (SI Appendix, Fig. S6). This is the only nonnatural example of a protein with a link seen in our survey.

Proteins with the Hopf Link Most of the protein structures with links form Hopf links. After sequential clustering, we found 14 representative Hopf-link structures, presented in Table 1 with their structural details. The linear orientation of a protein chain introduces a chirality to the Hopf link, resulting in two topological types denoted ± , which differ by the piercing direction defined in refs. 17 and 18 (Fig. 1, Top row). This sign is also indicated in Table 1 (column “Orientation”). Moreover, in most cases proteins possess more than two covalent loops. The full set of representative protein structures with schemes of their disulfide bond arrangements is presented in SI Appendix, section S8. In particular, in two cases (PDB codes 1HD5 and 1WC2) one of the loops forming a Hopf link is contained in the larger one; in these cases, details for the larger covalent loop are given in Table 1. Table 1. Structural data for representative protein chains with the Hopf link, along with proteins’ function As can be seen from Table 1, proteins with a Hopf link vary in size from 57 aa to 820 aa. The size of their loops also varies greatly (17–387 aa). However, in most cases the covalent loop has less than 100 residues. Strikingly, in most cases the covalent loops are separated by only a few residues (in 9 of 14 cases the loops are separated by 2 or fewer residues). Thus, a large loop size and large separation between them may imply that in such structures the link topology is not functional and is an accidental effect of the disulfide loop and chain arrangement (e.g., structures with PDB codes 1H30 and 4B56). Furthermore, closed loops seem to be inequivalent—for proteins with a positive Hopf link, the sequentially latter loop is always larger, whereas the sequentially first loop is usually (but not always) larger in structures with a negative Hopf link. There are similar numbers of positive (eight) and negative (six) Hopf links, indicating that there is no obvious preference for any chirality. Note that a separation between covalent loops is, in most cases, the size of the average persistence length of a protein chain. This leads to a hypothesis that a small separation is actually the reason, not the effect of a link topology. In fact, in the PDB database there are 104 representative structures with cysteine loops separated by exactly two residues; thus, almost 7% of them are linked. A small separation between the covalent loops can influence the structure in two ways. First, because of the persistence length and steric effects, cysteines located in the nearest vicinity are inhibited from the formation of nonnative disulfide bridges (at least as long as they are separated by an even number of residues). This effect reduces the possibility of the formation of nonnative disulfide bridges. Second, the persistence length locally forces the chain arrangement, creating the possibility to control the cysteine mutual exposition direction and facilitating the formation of the correct (link) topology. Despite a low sequence similarity, structures with the positive Hopf link seem to be collectively more similar to one another, as follows, e.g., from structural data in Table 1. To formalize this observation, we calculated the sequence identity and structural P value for each pair of structures (SI Appendix, Table S2) and found that the average sequence identity for proteins with the positive Hopf link (13%) is higher than for the negative Hopf link (4.5%). Similarly, the respective P value is lower (3.43E-02 and 8.76E-0, respectively, for positive and negative structures). This result suggests that proteins with the positive Hopf link are distantly related, in contrast to those with the negative one. Function and Origin of Proteins with a Hopf Link. The next intriguing question is why proteins conserve a complicated link topology despite naturally expected problems during folding. One possible answer is a functional importance for links. The function and origin of proteins with the Hopf link are shown in Table 2. The mutated glutamate receptor 2, as an unnatural protein, is excluded from this analysis. Table 2. Function, orientation, molecule type, kingdom of origin organism, and cellular location for representative proteins with the Hopf link topology As follows from Table 2, linked proteins fulfill different functions. However, all of the structures with the positive Hopf-link topology are classified into the same Barwin-like endoglucanases superfamily in the CATH (class, architecture, topology/fold homologous superfamily) and SCOP (structural classification of proteins) databases (19, 20) (3X2G was absent in both databases) and the Double Psi beta barrel glucanase (DPBB) in the PFAM (protein families) database (21) (1WC2 was not classified in PFAM). Taking into account different kingdoms of origin (animals, plants, fungi), this observation indicates that all proteins with the positive Hopf link originate from one ancestral, early eukaryotic, sugar-binding protein. Indeed, the affinity to sugars is conserved, however not evident, in the positive Hopf-link proteins. For example, sugar-composed cellulose is a target for endoglucanases, whereas cerato-platanins and barwin domains are known to be sugar binders. The beta-expansin (PDB code 2HCZ) is a fertilization factor, selective for cell walls rich in glucuronoarabinoxylans and β -d-glucans. On the other hand, most of the negative Hopf-link structures are not classified in the common databases, and their functions are more varied. Nevertheless, it is worth noting that also in this case four of five such structures are animal proteins. A hallmark of linked proteins is that all of them are secreted or transmembrane proteins. Disulfide bridges are known to introduce additional stability and in this case the stability is most evident. Cerato-platanins have been shown to be stable up to 76 °C (22). Fungal endoglucanase (1HD5) is stable after heating to 60 °C and with pH between 3.0 and 11.0 (23), retaining 45% of its activity after incubating for 5 min at 95 °C (23). The animal endoglucanase (PDB code 1WC2) withstands heating for 10 min at 100 °C without irreversible loss of activity (24). Possibly the reason for the exceptional stability lies in the topology of the proteins. In the case of proteins with the Hopf link, covalent loops cannot be separated, even at high temperatures, without breaking disulfide bridges. Moreover, such bridges are often buried deep inside the protein, additionally stabilizing the structure (Fig. 3). Fig. 3. Crystal structure of cerato-platanin (PDB code 3SUK). (A) Covalent loops forming the Hopf link are depicted in red and blue. Cysteines closing the loops and cysteine bridges are marked in orange. (B) Solvent-exposed surface. Colors correspond to the crystal structure. Only one cysteine bridge (marked with a black circle) is partially exposed to the solvent. To analyze the influence of links, we determined the energy barrier on unfolding in five models of 2LFK, the smallest protein with the Hopf link, differing only in their topology. The first model is the native structure with the Hopf link. In the second model the covalent loops are unlinked; the model is obtained by interchanging disulfide bonds (Cys-24 is paired with Cys-52 instead of Cys-51) without destroying surrounding contacts. Two additional models involve only one covalent loop (respectively red or blue in Fig. 4; i.e., with only one disulfide bridge), and the last model involves no closed loops (no disulfide bridge). During simulations the disulfide bridges are not allowed to break (which models oxidative conditions). As an unfolding measure we define the time needed for the structure to achieve 10 Å of rmsd from the native structure. To focus on topological properties and to gather good statistics, we used SBM model. For each model, based on an unfolding rate constant calculated at seven different temperatures, we determined an approximate unfolding energy barrier (Fig. 4). As expected, the model with two native closed loops has an energy barrier an order of magnitude higher than models with one or no closed loops; the presence of two loops stabilizes the structure to a much higher degree than a single closed loop. However, the Hopf-link structure has an energy barrier over 20% higher than the topologically trivial structure. This purely topological effect may explain the biological advantage of links in proteins. Fig. 4. The temperature dependence of unfolding rate constant for five models of TdPI protein, differing in topology. Inset shows the protein structure with native “red” and “blue” loop indicated. The stripes denote the loop closing pattern: orange and green for the (native) Hopf link and the trivial (nonnative) model. The fitted function is presented in the top right corner. On the right side the fitted values of E A / R are given. In the bottom right corner, the schemes of the Hopf link and the trivial model are presented. Folding of Proteins with the Hopf Link. Proteins with the Hopf-link structure may fold via three different pathways (Fig. 5). In general, the probability of each pathway depends on conditions. Oxidative conditions facilitate the formation of disulfide bonds and, therefore, of covalent loops (Upper and Lower pathways in Fig. 5). Once the covalent loop is formed, threading is required to complete folding the chain. In general, the chain threading was shown to be a major topological barrier in the free-energy landscape (at least for knotted proteins) (25,26). To investigate this issue, we performed a series of coarse-grained simulations for the smallest proteinwith the Hopf link (tick-derived protease inhibitor TdPI with PDB code 2LFK). This protein has two covalent loops of length 17 residues and 27 residues. We performed folding and unfolding simulations in two different models. In each model, one loop-forming bridge was made persistent. It turned out that if the larger (red in Fig. 5) covalent loop is formed, the protein can both fold and unfold. The loop threading occurs via bending the chain in the vicinity of the bridge, which resembles the slipknotting mechanism known for knotted proteins (26⇓–28) (Fig. 5, Lower pathway). Such a mechanism is possible because the piercing residue is located in the nearest vicinity of the bridge. In the case of the smaller loop (blue in Fig. 5), in which the piercing residue is located farther in the sequence, neither threading nor unthreading occurs. As a result, if the smaller covalent loop is formed, the protein can neither fold nor completely unfold. To approximate the probability for each pathway, we conducted a series of CG folding simulations without any bias from disulfide bonds and measured how often the pair Cys-24–Cys-51 (forming the larger, red covalent loop) will be located in the native distance before the Cys-52–Cys-69 pair (forming the smaller, blue covalent loop). It turned out that the Cys-52–Cys-69 bond (smaller loop) is created approximately three times more often than the Cys-24–Cys-51 bond (larger loop), independent of the temperature (SI Appendix, Table S3), definitely hindering folding. There is, however, the third pathway in which first the“interior” of the link is “twisted” properly and the loops are closed in the last step, which can be a way to overcome the problem of loop threading (such a mechanism is impossible in the case of knotted proteins). In the case of the TdPI protein, this mechanism is facilitated by the formation of a β strand and, in fact, it should be the most common pathway (SI Appendix, Table S3). Nevertheless, even in this pathway, the formation of native disulfide bridges can lead to a nonnative, trivial topology (topological trap). In the case of the TdPI protein such a misfolded structure can be very similar to the native one (compared, e.g., by the rmsd value), but covalent loops should be definitely more labile. In fact, the oxidative and reductive folding of this protein was studied previously (29). In particular, it was shown that in oxidative conditions the protein rarely achieves its native state, forming most often a nonnative structure with three (of four) cysteine bonds. The missing bond is Cys-38–Cys-58 joining the loops, which were very labile. The concentration of the misfolded product was reduced upon the addition of reducing agent (glutathione), which enabled the bond reshuffling. The bonds formed in the nonnative product can be nonnative as well. Thus, we used the simple CG model enriched with the Cys–Cys interaction, allowing for the creation of both native and nonnative pairs. In the case when the Cys–Cys bonds were allowed to break during the simulation (reflecting reducing conditions), ∼70% of the folding simulations finished in the native state, independent of temperature. If, however, the bonds were not allowed to break (oxidative conditions), ∼70% of the folding simulations finished in the nonnative, topologically trivial structure (SI Appendix, Table S4) with rmsd lower than 5 Å from the native structure. Moreover, this structure is not an artifact of the CG model, because all atom representation can be reconstructed from the CA trace [using the Modeller software (30)] (SI Appendix, Fig. S1). Therefore, it is highly probable that the folding trap observed in ref. 29 was the topologically trivial structure. Fig. 5. Possible ways of folding of TdPI. Folding can follow three different pathways, but formation of the small covalent loop as the first event blocks folding. Moreover, in the last folding step the protein can collapse to a topological trap (in red oval), characterized by trivial topology. Green oval denotes the native, Hopf-link structure.

Other Links Although the majority of links found are Hopf links, we also detected other topologies, which are presented in Dataset S2. After clustering homological sequences, we extracted two representative chains (Table 3) that form a Solomon link (Fig. 6). Both of them are fungal adhesive proteins, sharing only 21% of sequence identity. Nevertheless, they represent two closely related families of flocculation (2XJP) and epithelial adhesion (4ASL) proteins (31) and have highly similar structure, with the mutual structural P value of 3.96e-14. This result shows that the structure and the topology are much more conserved than the sequence. Possibly, aside from stability (both proteins are secreted), this is the second manifestation of the role of topology: The presence of a link holds the chain together, allowing for only minor changes in structure, despite even a large number of point mutations. As the topology of the Solomon link cannot be detected based on piercings only, we conducted a simulation of thermal unfolding, extracting only the link-forming disulfide loops. As a result, the loops moved apart, which enables one to identify the topology simply “by eye” (Fig. 6). Table 3. Structural data for representative protein chains with the Solomon link, along with proteins’ function Fig. 6. Exemplary structures of proteins with the Solomon link and the core of Borromean rings. (Top, Left to Right) An exemplary protein with the Solomon link (PDB code 4ASL), the covalent loops after unfolding, and the scheme of the corresponding Solomon link. (Middle, Left to Right) Structure of goat lactoperoxidase (PDB code 2E9E) forming the core of Borromean rings; the structure after smoothing—the blue surface is spanned on the main chain and on the disulfide bridge, and the red surface is spanned by the chain and is delimited by the blue surface (shown in cyan) and it is pierced by the green part of the chain; the schematic view of the protein—the solid color lines form the core of Borromean rings, shown in Bottom, Left to Right with retention of colors. Bottom Left presents the Borromean rings. It is interesting to note that proteins with the Solomon link create cell assemblies by recognizing and binding the carbohydrates, similar to proteins with the positive Hopf link. However, although Solomon-link proteins are approximately twice as large as Hopf-link proteins, a sequential and structural comparison of the domains constituting the Solomon-link and the positive Hopf-link proteins does not show any resemblance or clear sign of possible gene duplication.

Brunnian Links The procedure for link identification described above is general; however, it cannot identify so-called Brunnian links, i.e., links that cannot be decomposed, but become trivial (i.e., from a set of unlinked circles) upon removing any component. The simplest example of such a structure are the Borromean rings (Fig. 6, Bottom Left). The overall complexity of such a link stems from the fact that each component is pierced by another component at least twice, so that an arc is formed between piercings. Through this arc, a part of the next component has to be threaded, as in Fig. 6, Bottom Right. In the case of the Borromean rings, all three components are mutually entangled in this manner. Such configurations also can be formed in complex lasso proteins. To identify them, one has to consider every covalent loop that is pierced more than once [e.g., twice in proteins with a double lasso, denoted L 2 (17)] and identify piercings through arcs formed by such a structure (Fig. 6, Bottom Right). With this approach, we scanned the PDB database and identified such a complex structure in the goat lactoperoxidase (e.g., PDB code 2E9E) shown in Fig. 6, Middle. Here one ring is closed by the cysteine bridge Cys-6–Cys-167. The surface spanning that ring is pierced three times by the chain (triple-lasso topology L 3 type), via residues Gln-179, Met-352, and Ile-436 (Fig. 1, Bottom). The surface spanning the arc closed by Gln-179 and Met-352 is subsequently pierced by the next portion of the chain, via Asp-525 (Fig. 6, Middle). In the Borromean rings, however, this complex arrangement of the chains has to occur three times with three mutually piercing components. In the goat lactoperoxidase, this condition is fulfilled only once, and this protein (along with its homologs up to 30%) is the only protein structure with such an arrangement. We conclude that no protein Brunnian link exists in proteins deposited in the PDB. Nevertheless, the influence of the complex arrangement of the chain on the goat lactoperoxidase’s properties is intriguing. We can, in fact, distinguish two levels of complexity. One level is the piercing through the surface spanned on the covalent loop, with the formation of an arc. The second level is the piercing of the surface spanned on this particular arc. Interestingly, it was shown that goat lactoperoxidase is highly stable in a wide range of pH (4–11) and it unfolds in a two-step process (32). In the first step the peripheral fragment is being unfolded, leaving the core intact. This requires unthreading of the piece of the chain piercing the surface spanned on the arc (green part in Fig. 6, Bottom). In the second step, the core is unfolded, which requires unthreading of the surface spanned by the covalent loop (unthreading of red piece of the chain through the blue surface). Possibly, the arc piercing fulfills an analogous role in biology to that in mathematics. In the Borromean rings it prevents the overall unlinked rings from falling apart. Perhaps the evolution “used” the delicate topological properties to invent another mechanism for the stabilization of protein structures long before humans discovered Brunnian links.

Discussion It is known that links are crucial in many biological processes, e.g., in DNA replication, and their proper description requires tools from knot theory. The discovery of knots, slipknots, and lassos in proteins already has changed our view on proteins’ complexity. In this paper, we showed that this complexity is even more involved, and apart from knots, biology also designed stable links within single-protein chains. We described the structure of those links, as well as their possible evolution and biological significance. Topological links are formed by covalent, disulfide-based loops, and we identified three types of links in proteins: positive and negative Hopf links and the Solomon link. The link topology induces additional stabilization and this effect is independent of the stabilization introduced by disulfide bridges alone. This is especially important, because all proteins with links operate outside of the cell. Furthermore, the topology-induced stability provides an additional powerful tool for biotechnologists in designing new, extra-stable peptides, as suggested before for molecular knots (33). Despite low sequential similarity, all proteins with the positive Hopf link are carbohydrate-binding proteins, indicating that they have a common ancestor. The first positive Hopf-link proteins must have occurred in the first eukaryotic cells, before splitting the tree of life into animals, plants, and fungi, as the same topological motif can be found in all those kingdoms of life. This, on the other hand, indicates that topological constraints can allow great sequential variability. In fact, the only conserved fragment is located in the core of the link region, i.e., in the vicinity of cysteines closing the first and opening the second covalent loop (found by Clustal O; SI Appendix, Fig. S10). This, in turn, explains why there is no sign of gene duplication in proteins with Solomon links, although in principle the Solomon link could be obtained by a duplication of some parts of the Hopf-link structure and despite a similar propensity of Solomon-link proteins to carbohydrates. The folding of proteins with links is another interesting conundrum. In this work, we have taken a step in explaining possible pathways of folding of the smallest Hopf-link protein TdPI. Nevertheless, it is still puzzling that in both of our simulations and in experiments, in oxidizing conditions (in which this secreted protein should fold) the TdPI folds toward a nonnative state, which is supposed to be a topologically trivial structure. However, our results can also support theories of knotted protein folding in which the threading of a chain through a twisted loop is known to be the rate-limiting step of folding. In particular, we showed that the loop consisting of 17 residues is probably too small to allow threading at all. On the other hand, the loop consisting of 27 residues is large enough to allow threading to occur with the slipknot mechanism, which was proposed earlier to solve the threading problem in the case of knotted proteins (26). This result shows that analyzing the folding of proteins with links or complex lassos can provide important clues about the folding processes of entangled proteins. Another issue discussed in this article is the possibility of the formation of Brunnian links in proteins. Although we have not identified any such link in the PDB, we found an interesting case of goat lactoperoxidase, which fulfills one of the geometric requirements of the Brunnian link. It seems also that geometry of this protein is correlated with its two-step unfolding process. This shows, in general, how influential the topology can be and that studying topological effects in proteins may provide much insight about correlations between structure, function, and properties. Let us conclude with a remark that similarly to different types of entanglement being used, e.g., in material science, the same concept can be used in biology to equip proteins with special properties. Our methods described in this work can be used to identify and analyze properties of linked protein chains and even more complex structures, e.g., macromolecular links in virus capsids, whose structures will start to be available based on the EM method. Proteins with links are collected in the LinkProt database (34).

Materials and Methods Protein Dataset. In this work, we analyzed all protein structures deposited in the PDB database as of May 2016. Gaps were modeled by straight-line segments; however, no additional crossings were introduced. Sequential clustering was done using BLAST. The structural (P value) and sequential homology were calculated using the jFATCAT algorithm. Protein Dynamics. The dynamics of the protein were conducted using Gromacs with SMOG software (35); details are described in SI Appendix. The surface was designated as in refs. 17 and 18.

Acknowledgments This work was supported by European Molecular Biology Organization Installation Grant 2057 and Polish Ministry for Science and Higher Education Grant 0003/ID3/2016/64 (to J.I.S.), University of Warsaw Young Researcher Fellowship Grant 120000-501/86-DSM-112 700 (to P.D.-T.), and Polish National Science Centre Preludium Grant 2016/21/N/NZ1/02848 (to P.D.-T.).

Footnotes Author contributions: P.D.-T. and J.I.S. designed research; P.D.-T. performed research; P.D.-T. and J.I.S. analyzed data; and P.D.-T. and J.I.S. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1615862114/-/DCSupplemental.