Peer review is intended to act as a gatekeeper in science. If working researchers deem a paper fit to be published, it should mean that the research is sound, rigorous, and accurate. But an experimental analysis of peer review suggests that peer review might also end up rejecting high-quality material. The analysis points to high levels of competition as the source of the problem.

Because peer review is a vastly complex system that can function quite differently in various disciplines, researchers Stefano Balietti, Robert L. Goldstone, and Dirk Helbing constructed an experimental game designed to mimic some of the primary features of peer review. Participants were divided into 16 groups of nine people each and tasked with creating a piece of “art” on a computer interface. The pieces could then be submitted to one of three “art exhibitions.”

Each participant was then given three pieces of other people's art to review; pieces that averaged a score higher than five out of ten were accepted into the exhibition. Each group played 30 rounds of the game.

In the non-competitive condition, reviewers could accept all of the art that was submitted, and everyone who had a piece accepted received a small payout. In the competitive condition, a fixed payout was divided between everyone who had a piece accepted, giving the reviewers (who had their own pieces under review) an incentive to judge others' pieces more harshly.

After the experiment, the artwork was given to 620 Mechanical Turk workers to judge, giving the researchers an independent assessment to compare with the results of the peer review.

Of course, there are some very important differences between this system and actual peer review. Most significantly, science has more objective criteria for review than art does. However, the authors have created a system where the members are also the gatekeepers, so it’s a pretty close imitation.

The results echo a lot of the arguments in the current debates about the state of scientific research.

First, the competitive condition seemed to boost innovation. Where participants in the non-competitive condition started showing signs of social imitation, with the same ideas bouncing around within a group, the competitive participants had work that was fiercely differentiated from others in their group. It's possible that this reflects issues like publication bias and lack of incentives to replicate others’ research.

This suggests that the competitiveness of peer review has one of its intended effects—sort of. Although competitive participants didn’t imitate each others’ work much, they did start to copy their own earlier work much more than non-competitive participants.

But innovation alone is not enough: the innovation has to see the light of day. The peer reviewed system didn’t do very well with this. Comparisons with the Amazon Turk reviews suggested that peer reviewers in the competitive condition were unfairly giving lower scores to their peers, presumably because this gave their own work a better shot at the exhibition and resultant payout. Competitive reviewers also saw their opinions diverge so much that agreement between them fell below the level of random chance.

Even more disconcertingly, the competitive condition kept some of the higher-quality art out of the exhibitions. This gels with the finding that some of the most-cited work in medical sciences was rejected by top-tier journals, going on to be published in less prestigious outlets.

There’s a tradeoff here, though. While the competitive condition rejected around 34 percent of the higher quality art than the non-competitive condition, it also rejected around 40 percent more low-quality art. So there may be some misfires, but the process does do its job in keeping out the riffraff. Whether it’s better to risk rejecting good stuff to keep out the sloppy work or let enough sloppy work through to publish everything good is a tough decision. It's likely to depend on the nature of the field and problem at hand, the authors write.

None of this means that peer review should be chucked out of the window—right now it’s pretty much the best method we have to impose quality control on science. However, that doesn't mean it couldn't use a bit of scrutiny and change. This research shouldn't be viewed as exhaustive or conclusive; future work will need to assess how close this imitation comes to actual peer review and explore ways to balance competitiveness with quality control.

Still, the new study is likely to give ammunition to many researchers calling for reform to the scientific system. Their ideas include measures such as pre-registration, post-publication peer review, and insistence on open data and methods in an effort to reduce many of the current problems.

Questioning and revising the incentives in science goes along with that, the authors write. “The results of our study suggest a redesign of the scientific incentive system such that sustainable forms of competition are promoted.” Finding out what sustainable competition looks like, and whether it will work in science, is likely to play an important role in fixing some of what's not working well.

PNAS, 2016. DOI: 10.1073/pnas.1603723113 (About DOIs).