Does Peer Review Really Work? Leila Agha investigates a pillar of science funding

Leila Agha, an assistant professor at BU’s Questrom School of Business in the markets, public policy, and law department, co-authored a report on the peer-review system. Photo courtesy of Leila Agha

The National Institutes of Health (NIH) is the major funder of biomedical research in the United States, distributing some $30 billion to scientists each year. To decide who gets money, the NIH subjects grant proposals to a rigorous system of peer review: each proposal is assigned to a committee of scientists familiar with the area of research. The committee, called a study section, reviews the proposal and gives it a score. The NIH funds proposals in order of their score until the budget for that year runs out.

The concept of peer review is central to NIH funding and to science itself—journals choose articles for publication only after they are scrutinized by fellow scientists. The idea is to weed out weak research and ensure that only the strongest science goes forward. But does it work?

That’s what Leila Agha, an assistant professor at Boston University’s Questrom School of Business in the markets, public policy, and law department, set out to investigate. In a report published in the journal Science, Agha and her co-author, Harvard Business School’s Danielle Li, picked peer review apart, trying to see if the system really rewarded the best proposals, or if it simply favored “rock star” scientists from big-name institutions.

BU Research: Why did you choose to study NIH peer review?

Agha: There has been considerable debate in recent years about how successfully the NIH is allocating their resources, particularly as the budget has become tighter and funding has become more and more competitive.

What is the debate about?

There have been a few critiques of peer review. One says maybe peer review can weed out weak proposals, but it’s not very good at identifying the really path-breaking research—maybe it’s unintentionally weeding out those risky projects that have the potential to really change the field of research.

I’ve heard a lot of scientists say that—that the more conservative proposals get funded.

That’s right. So that was one issue we wanted to investigate. Another critique that you sometimes hear is that the review committee is not reading the details of the proposals to figure out which are the most promising. The concern is that proposals with star scientists or elite institutions associated with them will get funded. It doesn’t have to do with the content of the science; it has to do with how important or famous the person already is.

That’s another common complaint among scientists.

Exactly, particularly among early-career investigators. We tracked 130,000 grants funded by the NIH between 1980 and 2008, and looked at the number of publications that came from that research, the number of citations to those publications, and whether there were follow-on patents. We matched this data to a lot of information about each investigator, like: What is their institutional affiliation? How many citations and publications have they had in the past? How many of those have been “big hits” (highly cited)? Have they been successful at getting NIH grants in the past? How experienced are they? When did they receive their MD or PhD? And by controlling for all those factors, we’re able to refine our measure of what committees are doing.

One of the things you controlled for actually has a name, the Matthew effect. What is that?

It’s a sociologic idea that the rich get richer and the poor get poorer; it’s named for a Biblical reference. In this context, it is not about money per se, but this idea that someone who’s already famous might receive a better score on their grant application and garner a lot of citations even if their research isn’t necessarily better. We wanted to investigate whether committees were generating insight about the quality of the proposed research rather than solely rewarding past successes.

It seems so difficult to control for all those things: the prestige of the person, the status of the institution.

We start out not controlling for anything, but ask simply: What’s the relationship between the score and the research outcome? And then we say, okay, let’s statistically control for the field of study, because different fields have very different citation patterns. So that’s straightforward. And then we say, okay, let’s control for the publication history of the principal investigator—how well published was he or she in the past, before he submitted the grant. Then we add in controls for the career characteristics, and controls for grantsmanship skill—did he or she have NIH grants in the past—and then we control for the type of institution—how elite it is. And what we show is that as we add these successive controls, it doesn’t attenuate the relationship very much between scores and grant outcomes.

So in lay terms, that means peer review seems to work?

Peer reviewers seem to be contributing expertise that rewards high-impact science, and this insight couldn’t be predicted solely from the investigator’s publication history, grant history, or other quantitative measures of past performance. So I’m not trying to say it’s the best of all possible methods, but I think it does refute some of the more stark critiques that, for example, reviewers completely fail to reward high-impact research, or that reviewers are just reacting to, say, the institution that the PI is at or how successful he has been at publishing in the past.

They say democracy is the worst form of government until you look at the other ones. People complain about peer review, but have there been alternate ideas?

Peer review is really the central model, which is why it’s so important that we understand how well it works. There are slight variations, but the fundamental idea of having a peer review committee allocate funding is not only done by the NIH but also the NSF and the European Research Council. It’s really the fundamental mechanism through which public money gets funneled to external researchers.

It’s also the fundamental way science is published, in general.

You’re exactly right. There are things that are special about the structure of NIH peer review committees, but I think that it can give us insight into what peer review can and can’t do. And I think that there actually has been relatively little research on it.

Were you surprised by the results?

I think there has been real concern, as it becomes more competitive to get funding, that valuable research is being weeded out. And certainly I’m not arguing that doesn’t occur. However, it does seem that even among very well-scored grants, it’s still true that the grants in the top 1 percent or 2 percent are more likely to produce a high number of citations or “hit” publications than grants that are just slightly lower scored—for example, scored in the top 10 percent. And so even among the set of very, very strong applications, the committees are still able to discern some dimension of research potential that’s predictive of publication and patenting outcomes. So these results are interesting and encouraging for how we evaluate scientific work.

What do you hope will come out of this research?

It’s valuable to know that the process is, on some level, successful at identifying promising proposals. What we are not saying is that the peer review committees are in some sense infallible, or that they never make mistakes, or that this is the best possible allocation mechanism. There’s no other system that we were able to investigate and compare it to. It’s encouraging that peer review generates insight about research potential, but that doesn’t suggest it couldn’t be improved.