If you've read enough of the reporting on the completion of genomes, you'll invariably come across a science writer who has compared the genome to the operating system of the cell. Apparently, a team of researchers from Yale decided to take the metaphor seriously. They built a call graph of the Linux kernel, and compared that to the gene regulation network of the gut bacterium E. coli. Given that the two serve radically different purposes, it should come as no surprise that the layouts look radically different—but the real surprise may be that there are so many intriguing points of comparison.

We'll take a look at each of the two systems in turn. To create a graph of the E. coli gene regulatory network, the authors divided up genes into three categories. Some genes don't do any regulation; they perform a structural or metabolic function and only receive input from the regulators. These were defined as workhorses, and placed at the foundation of the graph. Other genes participate in regulatory networks, receiving input from their peers, and controlling both workhorses and other regulators—these were termed middle managers. Finally, a few master regulators sit on top of the hierarchy and only regulate other genes.

This is obviously a bit of a simplistic analysis. Even the master regulators wouldn't get produced if there weren't some basic machinery around to transcribe genes. Some regulatory relationships rely on more than proteins—the proteins won't work unless a small molecule signal is around. And, finally, transcription of genes isn't the only level at which regulation takes place, as there are lots of biochemical steps between there and a functional protein.

Although these are significant caveats, the pattern the authors saw is pretty obvious: bacteria have a very bottom-heavy, workhorse-rich gene hierarchy. Most of the genes make proteins that go off and do other things; the middle managers and master regulators account for less than five percent each of the total gene complement.

Bacteria (left) have a workhorse-rich regulatory hierarchy, while the Linux kernel (right) is filled with upper and middle management.

Image courtesy of study author Koon-Kiu Yan.

Shifting gears to the Linux kernel, we see the call graph has a similar set of functions. Some of them get work done without having to resort to calling another function, which is the workhorse level. Middle managers both require other functions to do their task, and get used by a different set of functions in return. Finally, the master regulators are called by no one, although they require calls to other functions to get their work done.

Based on these criteria, Linux is very management-heavy. Over 80 percent of the functions in the call graph appear in the two upper layers.

The authors then performed several comparisons. For starters, they defined modules as regulatory networks in which control flows from the master regulators down to workhorses. Because of its paucity of upper management, the E. coli genome only has 64 of these; Linux has over 3,600. Bacterial modules also tend to operate in isolation. Less than five percent of them overlap, and only 15.6 percent of the genes get reused in multiple notes. The equivalent numbers for Linux functions are 81 and 88 percent.

(There's an intriguing parallel with the most frequently reused item. For Linux, it's printk, which handles standard display, and shows up in over 90 percent of the modules. For E. coli, it's a protein that controls the flow of metabolites across the cell's membrane. That may be central, but it still shows up in only 20 percent of the modules.)

They then took an evolutionary view. Persistent genes were identified based on conservation across 200 bacterial species. Persistent functions are the ones that have been around since the first version of the Linux kernel.

For the bacteria, almost all of the persistent genes are workhorses, and very few of the persistent workhorses appear in multiple regulatory modules. For Linux, there's a bias towards persistence in management functions, and those functions that have stayed around are more likely to be used by several modules.

The authors argue that these differences are the product of different forms of selection. For E. coli, the pressure that keeps the regulatory modules separate is the need for robustness. If one of the genes involved gets damaged, only a small portion of the cell's network is inactivated, so problems don't propagate. In contrast, Linux is constrained by a combination of computer hardware, user needs, and developer planning. The former two can keep functions around for a while, even if they're not widely useful. The latter places a premium on code reuse, which can help drive the tendency towards overlapping modules.

All of that seems like a neat and tidy explanation, which tends to make me mistrust it (they also called Linux "popular," which didn't help either). But the authors suggest that they plan on taking it to Eukaryotes, which are much more complex, and seem to have a larger fraction of genes acting in the management levels. Developmental studies also produce many organs by redeploying the same module in new contexts—triggering bone development in both the ribs and fingers, for example. If they're right, then the eukaryotes may look a lot more like Linux.

PNAS, 2010. DOI: 10.1073/pnas.0914771107 (About DOIs).