Lior Pachter, biological networks, and the future of science

The scientific community is assessing manners, I think, because so many of us are in a better position to comment on those than on the science.

There’s more drama in bioinformatics this past month, as Lior Pachter, with his student Nicolas Bray, published a series of blog posts eviscerating, among other work, Feizl et al.’s recent paper on network deconvolution. Lior alleges that the paper’s methods are not reproducible; that its mathematical techniques are both different and less generally useful than the authors claim; and that a supplementary figure was deceptively and silently changed.

The community has, unsurprisingly, reacted. My Twitter feed lit up with links, cheers, and boos; the authors responded; Pachter and Bray responded in turn; the posts accumulated dozens of comments; and many people are still talking.

What should we make of all this?

That’s a hard question, so I’ll start with an easier one: what should I make of all this? I might be a relevant case study here, if we’re thinking about how people read Lior. My undergraduate degree is in math, my graduate studies are in ancient philosophy, and I’m self-taught in computer stuff. I also spent some time as a math textbook editor while I played poker semiprofessionally. I was surprised, and a bit humbled, to learn that I could do productive work that advanced bioinformatics. It wasn’t so much that I could contribute but that I could do so scientifically: After a crash course in biology and some time playing with SAMtools and other popular software, my training in math, statistics, and epistemology put me in a position to teach biology Ph.D.s something–and teach them not just about math but about biology.

What’s important is that this isn’t happening because I’m some gifted polymath. It’s happening because the field of bioinformatics is a certain way. Although it’s common to express excitement about the interdisciplinary nature of the field (perhaps in the first thirty seconds of a conference presentation), I’m not sure we properly appreciate the extent of the field’s diversity, or its consequences.

One such consequence brings us back to Lior. So much of the commentary has been about Lior’s manners: are he and Nicolas too brash? Was it appropriate for them to call fraud? Shouldn’t he be gentler–or, perhaps, is an aggressive stance appropriate for a gadfly?

Surely some of this discussion is due to the approach Lior chose. I’d like to suggest that much of it, however, is happening because almost nobody really understands all of the issues involved. I certainly don’t: I’ve got some of the relevant graph theory chops and training in algorithms and software design; a solid background in epistemology helps too. But I only know the basics of biological regulatory networks, and I note with shame that my linear algebra is a little rusty. I’m simply not qualified to fully judge Pachter and Bray’s critique–or the original papers. Most of my colleagues (both here at Seven Bridges and elsewhere in the field) are similarly positioned: the experts in regulation would need to do work in statistics or graph theory or both. Perhaps we could do a good review of the situation if we all worked together, but none of us working alone, or even casually consulting our colleagues, could thoroughly assess the situation. The scientific community is assessing manners, I think, because so many of us are in a better position to comment on those than on the science.

Our time in science

It’s fitting that this is happening as math proofs too big for any human to check are also in the news, and we ought to wonder whether this is simply how future science will be. (I was sensitized to these issues by Tyler Cowen, the economist and futurist from George Mason, who discussed the future inscrutability of science in Average is Over.) This is an under-appreciated lesson of the debate: it is being shaped by the fact that the paper, unlike a calculus test or even the analysis of a western blot, is something that almost none of us can independently assess.

The culture of bioinformatics, and specifically the practices surrounding the assessment of research, will adjust to its unprecedented multidisciplinarity and relative inscrutability. This is both inevitable and healthy. It would be a shame, though, if these adjustments limit us to debates about manners. We ought to cultivate the skill of using our expertise to evaluate research to the extent we can. Although good scientists are trained not to judge what they don’t understand, neither should they be over-modest by remaining totally silent whenever they understand something only partially. Given how often each of us will be in a state of partial understanding, that sort of modesty will increasingly force one into long silences.

From a more corporate point of view, this state of affairs affects how my colleagues and I build things at Seven Bridges. For a while I’ve championed tools that facilitate transparency and reproducibility. Current events reaffirm these beliefs but also make me appreciate the facilitation of division of labor. Of course, transparency and reproducibility themselves encourage division of labor, as they allow a community of diverse experts to assess and replicate (or not) a given finding–but having a broad array of tools and data available in a cloud environment encourages one to think about scientific research in a modular way. It makes it easy to test and assess those tools and data individually or in small combinations–or, indeed, to defer such assessments to those who know more than you do about a specific subject.

It isn’t strictly necessary to use a cloud platform–ours or anyone else’s–to achieve these valuable effects in genomic and bioinformatic research. But in an era when the Broad Institute has trouble running Cufflinks, I find myself thrilled (if a little daunted) by the future of the field, and increasingly confident that by doing the work I do, I’m on the right side of history.