Derek Lowe's commentary on drug discovery and the pharma industry. An editorially independent blog from the publishers of Science Translational Medicine . All content is Derek’s own, and he does not in any way speak for his employer.

So how many proteins are there in a living cell? Not how many different proteins, although that’s a pretty good question all by itself, one that’s been investigated pretty thoroughly. But how many actual protein molecules are there of each of those?

This new paper is the latest attempt to answer that one, from the Brown group at Toronto. They go over 21 previous protein-abundance studies on yeast cells (which is where a lot of heavy-duty work on that question has been done) and normalize these to the less-used (but arguably more intuitive) “molecules per cell” numbers. A few studies have done this in the past, but this one certainly seems like the most comprehensive. (Those previous estimates were used as part of the data flow in this one).

There are, by best estimates, 5858 proteins in a yeast cell. Almost all of these were detected in the various studies, with 467 left over. These seem to be involved in specific states like sporulation that the earlier studies may not have caught (or deliberately avoided), but that still leaves over 90% of the proteome available to estimate. As for the variations between the published studies, the numbers are loosest at the low and high ends of the scale, and tighten up in the middle. Mass spec estimates were the most sensitive, but also had the highest variability (two behaviors that often go hand-in-hand).

When you look at expression levels by functional class, the most heavily represented ones tend to be involved in translation – lots of protein is needed just to make more protein. The rarest ones tend to be things like DNA-damage-stimulated repair proteins, but (interestingly) proteins involved in ubiquitination were also way down there, which I wouldn’t have expected. Overall, though, there’s definitely a longer (and more populated) tail towards the more-abundant side of the distribution.

In general, the range of “molecules per cell” is huge. There are proteins that appear to be present in single-digit copies only, and many that only have a dozen or two copies at best. At the other end of the scale, there are some with over 100,000 molecules per cell! Here are some general numbers:

An estimate of yeast cell protein content can be derived from the cellular protein mass per unit volume and the mass of the average protein (Milo, 2013). Using a density of 1.1029 g/mL (Bryan et al., 2010), a water content of 60.4% (Illmer et al., 1999), and a protein fraction of dry mass of 39.6% (Yamada and Sgarbieri, 2005), typical of yeast in standard growth conditions, we calculate 0.17 g of protein per mL. With an average protein mass of 54,580 Da, and mean logarithmic phase cell volume of 42 μm3 (Jorgensen et al., 2002), we calculate 7.9 x 107 protein molecules per cell. Adding the median abundances of all detected proteins in our unified abundance dataset, we arrive at a total of 4.2 x 107 protein molecules per yeast cell, or 0.53 of the calculated estimate. Total protein content estimates derived from individual studies agree well with our estimate (4.5 x 107 [Ghaemmaghami et al., 2003], 5.3 x 107 [von der Haar, 2008], and 5 x 107 [Futcher et al., 1999]), and also tend to be lower than the calculated estimate of 7.9 x 107 molecules per cell. We infer that our aggregate abundance estimates are likely accurate within 2-fold, on average.

The relationship between transcription, translation, and protein abundance is pretty complex, as we already knew, and this analysis just highlights that. There are several different ways to arrive at similar protein levels, and all of them show up in various cases when you look closer. Interestingly, some of the outliers in these plots also cluster in function as well (cytosolic translation machinery and glucose metabolic handling, for example), which suggests that these things are not accidental, not after a couple of billion years of evolution, anyway.

These estimates also provide a way to get at a key difficulty in chemical biology: what happens when you artificially tag a protein? Some of the data sets were collected by fusing a GFP (green fluorescent protein) tag onto the native proteins, and comparing those to the mass-spec measurements is always informative. There are over 700 outlier proteins in that comparison, with the great majority being expressed lower than expected when tagged. C-terminal GFP seems to be a particular offender (281 proteins with lower expression), with occasional readings ten-to-fiftyfold lower than you’d expect versus mass spec readouts. But on the other hand, some of those were also lower when tagged at the N-terminal, or with TAP instead of GFP, so they just plain don’t like being modified. And to be sure, there are 259 proteins where GFP-tagging gave more protein than expected, up to 67-fold in one case. Tags, as anyone working in this field should know, are not silent, especially when they’re the size of a whole functional GFP. (It should also be noted that the lower limit for GFP protein detection is somewhere around 1400 molecules per cell, and there are an awful lot of proteins that are expressed at lower levels than that).

Many of these estimates are subject to change under different conditions – position in the cell cycle, environmental stress, and so on. The majority of proteins don’t change their abundance in response to various stress conditions (temperature, pH, starvation, oxidative stress, etc.), but the ones that do can vary up to a hundred-fold. And there only appear to be a handful of proteins that are “universal stress responders” – the rest of them are more particular to certain types of trouble.

It’s a mess down there, is the main take-home of this work (and the papers it builds on). The environment of the cell, at the protein level, is highly differentiated and covers a wide range indeed. Whenever I read this sort of thing, I’m amazed that our small molecules work at all – it’s simultaneously exhilarating and terrifying to think about all that’s going on. For example, do you want to go after a target that has only six copies per cell? Just finding it might be a challenge – but on the other hand, if you do hit it, something is very likely to happen. . .