As in just about every other field, computers have become an essential part of biological research. Complicated algorithms and analyses that once took months of work by specialists are now available as Web services, and whole areas of study, such as genomics, can be pursued entirely in silico. But, even though most biologists know how to plug in their data and act on the output of computational tools, precious few understand the math that's going on behind the scenes, as most bioscience degree programs don't require computer science or any math more advanced than calculus.

Two papers in the latest issue of Science argue that that's a bad thing. One focuses on the ability to represent the behavior of biological systems through algebraic notation, an area that's badly neglected in both science and math education. The second focuses generally on the incorporation of biology-specific math and computer science into the education system. Both assume that the lack of a math background is a serious problem.

In general, as someone who has done a small bit of bioinformatics and a lot of biology, I'm the perfect target audience for this argument. But in reading the papers, I came away with the sense that the authors have lumped different arguments together in a way that confuses the real issues. So what follows is my attempt to separate them out and evaluate each issue separately. The first problem arises in the paper from Pevzner and Shamir, which treats the terms computational biology and bioinformatics as two names for the same discipline. That may be how things are commonly understood but, to me at least, these are two separate endeavors.

Bioinformatics, as its name suggests, is primarily focused on the computer-aided analysis of data generated in biological systems, such as genome and gene expression array analysis. We'll get back to that later. Computational biology involves the attempt to model biological systems in silico. These models are informed by the biology, but they don't necessarily require any biological data to be fed to them in order to run.

Obviously, anyone performing computational biology better have a really good grip on both biology and math/computer science, or they won't be able to know whether the models are valid and fix them if they're not. The same really doesn't apply to bioinformatics. Since there's always real, underlying biological data there, the computation and analysis can be separated—a bioinformatician can simply turn to a biologist and have them sanity-check the results.

Fundamental, tool, or service

So, if we accept that everyone doing computational biology better know both math and biology, that's still not evidence that regular biologists need math. Most regular biologists will end up using bioinformatics tools to align DNA sequences, pick primers, etc. So do they need to know the math behind the tools? I think to answer that, you have to understand where bioinformatics sits on what I'd call the fundamental/tool/service spectrum.

For biologists, fundamentals are things like organic chemistry. All of biology ultimately depends on it, and every biologist should really know something about it—even field biologists, who will have to consider things like how diet and environmental chemicals affect the organisms they study. Bioinformatics really isn't a fundamental; knowing how certain calculations are performed won't necessarily tell you anything about biology.

In fact, it's somewhere between a tool and a service. A tool is something that an average biologist will wind up using that has some biology behind it. So, for example, it's possible to use PCR to amplify DNA samples without knowing anything about what's going into the tubes used for the reactions. But it's much better if a biologist does know; the reactions behind PCR illustrate biological principles, and are essential knowledge for troubleshooting the procedure when it goes wrong (as it inevitably does). In contrast, DNA sequencing, which used to be a tool, has become a service. You put your DNA sample in the mail, and download the sequence data from an FTP account a few days later. The precise details of the actual sequencing reaction that was performed don't really matter.

For the most part, bioinformatics software like those for sequence search and alignment are analogous to a service: the computer spits out a useful result, and you really don't care how it got there. If you can't get a decent result, your first response isn't to look for someone who knows math; it's to look for someone who's more proficient with the service, and knows how to tweak the input parameters. Knowing the math behind things might help with the tweaking or to appreciate the underlying biology, but it just as well might not—empirical experience can be more useful in many cases.

In a worst case scenario, of course, biologists can always resort to contacting someone who has training in bioinformatics, in much the same way as a biochemist might contact an immunologist if they needed to know more about that field.

That's supposed to be helpful?

If bioinformatics is a service, why isn't knowing how to use something as a service good enough? The authors simply state it is without providing an explanation. "For example, biologists sometimes use bioinformatics tools in the same way that an uninformed mathematician might use a polymerase chain reaction (PCR) kit," they write, "without knowing how PCR works and without any background in biology." Presumably, we're supposed to view that as problematic, although the authors never explain why it is.

The second paper, from Robeva and Laubenbacher, isn't brilliant about supporting its position, either. It's a sort of plea for education in algebraic modeling, which can apparently be used to represent biological systems. The authors make their argument by using a textbook case: the Lac operon, a gene regulation system that appears multiple times in a typical biologist's educational history, probably starting at AP bio in high school. In modeling terms, however, the Lac operon needs three equations to be described, one of which takes the form:

L=k L ? L (L e )? G (G e )Q - 2?M(L)B - ? L L

They point out that presenting it in Boolean terms leads to a simplified diagram that still captures the essential features of the system. Even when simplified, however, it's not obvious that the model is any more informative than the standard textbook description, which refers directly to the biology. And I'm skeptical that knowing the model would actually improve a biologists' ability to perform biological research.

This probably comes across as overly harsh—to a certain extent, the authors have a valid point: the more biologists know about the tools and services that they rely on, the better off biology as a whole will be. Informed researchers are more likely to notice anomalous results and squeeze more information out of their data by better deploying existing tools. And the authors' suggestion that we design mathematics courses that will prepare biologists to solve the problems they'll ultimately face would undoubtedly produce a more appealing math education.

But the same sorts of things can be said about biostatistics and physical chemistry, and it's rare to see either of those made a requirement for undergraduate degrees or doctoral programs. (The former would have been very useful at several points in my research career, and even more useful now.)

If the argument is going to be made that biologists should learn more math and computer science, then those advancing it need to do a better job of explaining what, precisely, biologists need to understand about the computational tools, and why simply knowing how to use the tool isn't good enough. There's also a practical issue at play; the authors argue that these additional computation courses be added to educational programs that are already loaded with required courses. That's pretty difficult to justify, especially given the other deserving topics that are already omitted from most program requirements.

In the end, the key questions are avoided in these papers: what, specifically, biologists need to learn, and how will it help them perform their primary function, namely biological research. Without that information, it's going to be impossible to actually design a course that might improve anything.

Science, 2009. DOI: 10.1126/science.1173876

Science, 2009. DOI: 10.1126/science.1176016