It has long been taught that proteins must be properly folded in order to perform their functions. This paradigm derives from work by Christian B. Anfinsen and coworkers. In the 1960's, they showed that RNAse, when denatured so that 99% of its enzymatic activity was lost, could regain enzymatic activity within seconds when the denaturing agent was removed under proper conditions[1][2][3]. They concluded that the amino acid sequence is sufficient for a protein to fold into its functional, lowest energy conformation. This work won the 1972 Nobel Prize, and was subsequently confirmed and extended by many researchers.

Beginning around 2000, it was recognized that not all proteins function in a folded state[4][5][6][7][8][9][10]. Some proteins must be unfolded or disordered in order to perform their functions, and others fold only in complex with target structures[11][12][13]. These are termed intrinsically disordered protein (IDP), intrinsically unstructured protein (IDP), or natively unfolded protein.

By some estimates, about 10% of all proteins are fully disordered, and about 40% of eukaryotic proteins have at least one long (>50 amino acids) disordered loop[7]. Such sequences, under physiological conditions in vitro, display physicochemical characteristics resembling those of random coils. They possess little or no ordered structure, having instead an extended conformation with high intra-molecular flexibility, lacking any tightly packed core.

At left is an animation of a heat shock/chaperonin protein fragment. Residues 1-70 are disordered; 71-109 are alpha helical. This animates 20 models from an NMR experiment (2ljl). Charges are colored blue=positive and red=negative . A high charge density prevents folding. For comparison, at right is an animation of 20 NMR models of a protein of similar length that folds into a stable domain (2n5a). Animations will stop after 25 cycles. Shift+Re-load this page to restart the animations. Internet Explorer and Edge: toggle spinning off to speed up animation. Click on an animation to see it larger.

Many crystallographic structures have missing loops -- that is, ranges of amino acids with no atomic coordinates in the model. These "gaps" in the model are often thought to be artifacts of inadvertant disorder in the crystal. In some cases, these gaps may be alerting us to the presence of intrinsically disordered loops in an otherwise folded protein[14]. Such gaps are the basis for the DISOPRED2 disorder prediction server. FirstGlance in Jmol offers one method for locating and visualizaing such gaps.

Despite the existence of compelling evidence for IDPs and intrinsically disordered loops beginning in 1990[15][16][17], many current textbooks of biochemistry and even some monographs on protein structure fail to mention intrinsic disorder and its importance for protein function[18][19]. In 2011, Chouard provided a readable and informative overview of IDPs and how some of them function[20].





Examples of IDPs

Examples cover a wide variety of cellular systems and it has been predicted that eukaryotes have more IDPs than other kingdoms [21]. Of course, there are no PDB codes for fully disordered proteins in isolation. However, there are some crystallographic results for IDP that undergo disorder-order transition when they complex with another folded protein domain, such as 1jsu, 1g3j, and 1oct[7]. Other examples are at Globular_Proteins. See further information about 1jsu and other cases below.

IDPs play roles in processes such as:

Cell signaling and cell cycle regulation, e.g. cyclin dependent kinase inhibitor p21Waf1/Cip1/Sdi1[22]

Oncogene, e.g. P53 contains large unstructured regions in its native state[23]

Assembly of cytoskeletal proteins, e.g. Tau protein[24]

Protein folding: some intrinsically disordered regions function as chaperones[25].

Membrane fusion and membrane transport, e.g. isolated components of the SNARE complex[26]

DNA recognition molecules, e.g. the basic DNA-binding region of the leucine zipper protein, GCN4[15]

Transcriptional activation domains, e.g. NF-kb[29], Glucocorticoid receptor, 77-262 fragment [30]. There is "widespread importance of structural disorder in gene regulatory proteins", such as Lacl/GalR and Hox[31].

Amyloid formation, e.g. prion protein, N terminal part[32], NACP precursor of the non-Ab component of the amyloid plaque[33]

Evasion of immune responses by parasites. Highly flexible disordered proteins are poor antigens [34].

Molecular Shields

It appears that hundreds of IDPs that remain soluble after boiling protect folded proteins against heat-denaturation, aggregation, and loss of activity from dessication or organic solvents[35]. They also appear to suppress neurodegeneration and extend lifespan[35]. They have been termed "heat-resistant obscure" (hero) proteins[35]. Their isoelectric pH's (pI's) form a bimodal distribution, so that most are negatively or positively charged at neutral pH[35]. Examples include six human proteins that were studied in detail: SERF2 (length 59), C9orf16 (length 83), C19ofr53 (length 99), BEX3 (length 111), C11orf58 (length 183), and SERBP1 (length 408)[35]. Estimated isoelectric points[36] are 10.5, 4.2, 11.6, 5.5, 4.7, and 8.6 respectively. In several test cases, scrambling the sequences of these proteins did not diminish their protective effects[35]. Their protective activity appears to depend on their high charge density and length, but not on a specific sequence.

Protein disorder predictors

Principles Used in Prediction

[37] output for three protein sequences (a) Cat-Muscle Pyruvate Kinase (b) The human p53 tumor suppressor protein (c) Chicken gizzard caldesmon; green is folded and red is unfolded FoldIndexoutput for three protein sequences (a) Cat-Muscle Pyruvate Kinase (b) The human p53 tumor suppressor protein (c) Chicken gizzard caldesmon; green is folded and red is unfolded Drosophila proteome (black) and in the cytoplasmic domain of gliotactin that was shown to be IDP (gray) [38] Content of order-promoting and disorder-promoting amino acids in theproteome (black) and in the cytoplasmic domain of gliotactin that was shown to be IDP (gray)

Led by the assumption that “since amino acid sequence determines 3-D structure, amino acid sequence should also determine lack of 3-D structure” [39] specific sequence features shared by IDPs have been evaluated and algorithms for their identification formulated.

The low hydrophobicity and high net charge of natively unfolded proteins result in a difference in amino acid composition between them and natively folded proteins [40].

Compared to sequences of ordered proteins, disordered protein sequences are substantially depleted in I, L, V, W, F, Y, and C, which were therefore designated as “order promoting” amino acids, and enriched in E, K, R, G, Q, S, P, and A, which have been designated as “disorder promoting”. The under representation of hydrophobic amino acids in a protein diminishes one of the basic thermodynamic forces known to be important for protein folding, namely, the hydrophobic interaction. Because a hydrophobic core does not form, such proteins have large hydrodynamic dimensions.

Prediction Servers

The quality of predictions by various algorithms have been evaluated beginning in CASP5 (2002). The assessment of disorder predictions for CASP8 (2008) has been published[41].

Prediction Meta-Servers

Meta-servers gather the predictions from other servers into a single report.

MobiDB. The Structure section in entries at UniProt.Org offers MobiDB. MobiDB also includes manually curated disorder data along with derived and predicted data.

D2P2: "pre-computed disorder predictions on a large library of proteins from completely-sequenced genomes. ... statistical comparisons of the various prediction methods ...."

IDEAL: Intrinsically Disordered proteins with Extensive Annotations and Literature: " We have introduced a new category..which only has an evidence of "ordered" in a binding state, but is thought to be "disordered" in an isolated state by manual inspections and the results of several disorder-prediction tools (DICHOT, Mobi, d2p2 etc.)"





Single Algorithm Prediction Servers

DISOPRED2 (Jones Group, University College London, UK). "DISOPRED2 was trained on a set of around 750 non-redundant sequences with high resolution X-ray structures. Disorder was identified with those residues that appear in the sequence records but with coordinates missing from the electron density map. This is an imperfect means for identifying disordered residues as missing co-ordinates can also arise as an artifact of the crystalization process. False assignment of order can also occur as a result of stabilizing interactions by ligands or other macromolecules in the complex. However, this is the simplest means for defining disorder in the absence of further experimental investigation of the protein." (Quoted from the DISOPRED2 website.)

FoldIndex[37] (Sussman Group, Weizmann Institute, Rehovot, Israel). FoldIndex makes predictions based on the observation that IDPs occupy the low hydrophobicity/ high net-charge portion of charge-hydrophobicity phase space. (See Figure above.)

IUPred (Dosztányi, Csizmók, Tompa and Simon: Budapest, Hungary). "IUPred recognized intrinsically unstructured regions from the amino acid sequence based on the estimated pairwise energy content. The underlying assumption is that globular proteins are composed of amino acids which have the potential to form a large number of favorable interactions, whereas intrinsically disorered proteins (IDPs) adopt no stable structure because their amino acid composition does not allow sufficient favorable interactions to form." (Quoted from the IUPred website.)

PONDR (Dunker Group, Indiana University and Molecular Kinetics, Inc., Indianapolis IN USA; Obradovic Group, Temple Univ., Philadelphia PA USA). "PONDR® functions from primary sequence data alone. The predictors are feedforward neural networks that use sequence information from windows of generally 21 amino acids. Attributes, such as the fractional composition of particular amino acids or hydropathy, are calculated over this window, and these values are used as inputs for the predictor. The neural network, which has been trained on a specific set of ordered and disordered sequences, then outputs a value for the central amino acid in the window. The predictions are then smoothed over a sliding window of 9 amino acids. If a residue value exceeds a threshold of 0.5 (the threshold used for training) the residue is considered disordered." (Quoted from the PONDR website.)

RONN (Esnouf Group, University of Oxford, UK). "We have developed the regional order neural network (RONN) software as an application of our recently developed ‘bio- basis function neural network’ pattern recognition algorithm to the detection of natively disordered regions in proteins. The results of blind-testing a panel of nine disorder prediction tools (including RONN) against 80 protein sequences derived from the Protein Data Bank shows that, based on the probability excess measure, RONN performed the best."[42]

WinDiso[43] (Grishin Lab, Dallas, Texas USA). "WinDiso is a linear, sequence- and alignment-based predictor of disordered/unfolded regions in proteins. It has the capability of adjusting for the increased tendency for disorder at protein termini. The simple weighted window-based algorithm and careful optimization technique make this a good predictor to use when trying to avoid bias toward special cases." (Quoted from the Grishin lab website.)

The above list is incomplete. Addition of other servers is welcome, and summaries of methods, pros and cons for each server would be useful.





Curated Collections

Because their very nature makes them difficult to categorize and study by standard means, several groups have set up curated listings of intrinsically disordered proteins and intrinsically disordered regions.

MobiDB3.0: More annotations for intrinsic disorder, conformational diversity and interactions in proteins - MobiDB also includes manually curated disorder data along with derived and predicted data highlighted above.

PED: Protein Ensemble Database: " an openly accessible database for the deposition of structural ensembles of intrinsically disordered proteins (IDPs) and of denatured proteins based on nuclear magnetic resonance spectroscopy, small-angle X-ray scattering and other data measured in solution." The database of conformational ensembles describing flexible proteins. This database seems to have stopped taking submissions in 2016.

IDEAL: Intrinsically Disordered proteins with Extensive Annotations and Literature: "a collection of knowledge on experimentally verified intrinsically disordered" proteins. IDEAL contains manual annotations by curators on intrinsically disordered regions...""

Disprot: " a database of experimental evidences of disorder manually collected from literature."





Biological implications of IDPs

It was proposed that the unfolded nature of the IDPs provides them with advantages in recognition and binding. Although their large hydrodynamic dimensions slow down diffusion, their size provides a large target for initial molecular collisions, and the lack of rigid binding pockets permits multiple approach orientations for a binding partner, which may increase the probability of productive interactions [44][39]. In addition, IDPs allow molecular plasticity by adopting more than one conformation and binding diversity by binding to several proteins and thus many of the known hub proteins are IDPs. IDPs rapid turnover in the cell allow their tight regulation as many times needed in cell signaling and cell cycle.

Evolution of IDPs

In p53, the folded DNA-binding domain is conserved, while the intrinsically disordered regions display a higher rate of mutations[45].

Many IDPs undergo disorder-order transition

Binding of natural ligands such as a variety of small molecules, substrates, cofactors, other proteins, nucleic acids or membranes may induce unstructured proteins to adopt stable structures bound to the partner, or even a secondary structure bound to the partner. In addition to the cases detailed below, other examples include 1g3j, 1oct[7], and the Lac repressor.

Some IDP sequences are able to bind to multiple partners that have <25% sequence identity, and in some cases even different folds[46]. For example, the C-terminal portion of p53 is known to bind to four different protein partners each with different folds[46]; and the N-terminus of histone H3 binds to nine different protein partners with distinct folds[46].

Kip1 kinase inhibitory domain [47] The human p27kinase inhibitory domain

The cyclin-dependent kinases (CDKs) have a central role in coordinating the eukaryotic cell division cycle. CDKs are controlled through several different processes involving the binding of activating cyclin subunits. Complexes of cyclins with CDKs play a central role in the control of the eukaryotic cell cycle. These complexes are inhibited by other proteins termed in general cyclin-CDK inhibitors (CKIs). One example of CKIs is p27Kip1. p27Kip1 is an IDP and it binds to phosphorylated in interacting with both and (1jsu). On cyclin A, it binds in a groove formed by conserved cyclin box residues. On CDK2, it binds and rearranges the amino-terminal lobe and also inserts into the catalytic cleft, mimicking ATP. [[1]]

[48] The transcriptional activator GCN4

The structure of GCN4 bound to a DNA fragment contains the perfectly symmetrical binding site (1dgc). A homodimer of parallel alpha-helices form an interhelix coiled-coil region via the leucine zipper, and the two N-terminal basic regions fit into the major groove of half sites on opposite sides of the DNA double helix.

The yeast transcriptional activator GCN4 belongs to a large family of eukaryotic transcription factors including Fos, Jun and CREB. All family members have a DNA recognition motif consists of a coiled-coil dimerization element, the leucine-zipper, and an adjoining basic region, which mediates DNA binding. This basic region is largely unstructured in the absence of DNA, addition of DNA containing a GCN4 binding site induce the transition of this region from unstructured to α-helical[49].

Practical Implications of IDPs

There is evidence that large intrinsically unstructured regions interfere with crystallization[14]. Oldfield et al., 2013[14], concluded:

The limited amount of intrinsic disorder present as missing density regions agrees with the idea that intrinsically disordered regions, particularly long disordered regions, inhibits successful determination of crystal structures, and suggests that avoiding or tailoring disordered proteins may aid in the determination of crystal structures.