Peer review is an essential process that subjects new research to the scrutiny of other experts in the same field. Today’s top Machine Learning (ML) conferences are heavily reliant on peer review as it allows them to gauge submitted academic papers’ quality and suitability. However, a series of unsettling incidents and heated discussions on social media have now put the peer review process itself under scrutiny.

Lack of mechanisms for reproducing experiment results

The annual Computer Vision and Pattern Recognition (CVPR) Conference is one of the world’s top three academic gatherings in the field of computer vision (along with ICCV and ECCV). A paper accepted to CVPR 2018 recently came under question when a Reddit user claimed the authors’ proposed method could not achieve the accuracy promised.

“The idea described in Perturbative Neural Networks is to replace 3×3 convolution with 1×1 convolution, with some noise applied to the input. It was claimed to perform just as well. To me, this did not make much sense, so I decided to test it. The authors conveniently provided their code, but on closer inspection, turns out they calculated test accuracy incorrectly, which invalidates all their results.”

The paper’s lead author Felix Juefei Xu promptly responded: “We are now re-running all our experiments. We will update our arxiv paper and github repository with the updated results. And, if the analysis suggests that our results are indeed far worse than those reported in the CVPR version, we will retract the paper.”

The Reddit poster’s challenge shed light on an often overlooked issue. Reviewers don’t necessarily invest their own time and resources on running codes and reproducing experiment results as part of the peer review process, rather they tend to rely on the honesty and competency of the authors.

Some ML conferences have begun looking for practical solutions. In 2017, researchers from the Montreal Institute for Learning Algorithms, Google Brain, and McGill University organized an International Conference on Learning Representations (ICLR) workshop with a strong focus on issues of reproducibility and replication of results in the ML community. Last year, the workshop launched a reproducibility challenge and called for an investigation on the reproducibility of empirical results submitted to ICLR 2018.

Lack of qualified reviewers

In a story covered by Synced earlier this year, a Reddit user who identified as a predoctoral student posted that they had been selected as a NIPS reviewer, and needed advice on how to properly write paper reviews:

“I’m starting graduate school in the fall so I’ve never submitted or reviewed papers for this conference before. How do I chose papers to review? Should I start reading old NIPS papers to get an idea? Most importantly, how do I write a good review?”

Many commenters questioned the original poster’s suitability as a NIPS peer reviewer. Ilan University Senior Lecturer and well-known natural language processing (NLP) expert Yoav Goldberg tweeted with a sarcastic tone, “yup. It’s ‘peer review’, not ‘person who did 5 TensorFlow tutorials review’.”

A record-high 3,240 papers were submitted to NIPS 2017, and this year’s total is expected to approach 5,000. Someone has to read all these submissions. NIPS organizers have no way to deal with the increasing paper volume but to expand their supply of reviewers. But from the perspective of AI researchers whose months or even years of academic study can be reflected in one paper, it is natural that there is serious concern over possibly unqualified reviewers reading for a top AI conference.

The term “peer reviewer” traditionally indicated a review by someone with a similar level of competence and experience. But the surge in ML research means qualified reviewers are now unsurprisingly in short supply. Carnegie Mellon University Assistant Professor Dr. Zachary Lipton tweeted “all that it takes to destroy a field is for it to become popular. If ML goes from 4k to 20k submissions per conference, then we will go from 50% to 10% qualified reviewers. At that point there will effectively be no peer review.”

How double-blind peer review can be sabotaged

Studies have suggested that reviewers are likely to be influenced by a paper’s authors and institutions. And so top ML conferences are now turning to double-blind peer review, wherein a submitted paper’s authors are not revealed to reviewers.

But that doesn’t mean double-blind peer review is a perfect solution. A paper submitted to ICLR 2019 recently received a large number of positive comments on the public platform OpenReview (“interesting work” and “promising results” etc.), prompting suspicion from one commenter, who wrote “there are few papers with comments by now when yours already got seven, especially when they are all one-line positive relies. I don’t think I need to say clearer, right? Everyone knows what you did. Please stop doing this. It will not bring you any benefits.”

Synced checked in today on the OpenReview comments in question, and found most had been deleted.

The purpose of double-blind peer review is to minimize bias. In addition to names and institutes, both positive and negative reviews can also certainly influence reviewers’ judgement. For example, the ICLR does not forbid authors from posting their papers on arXiv or any other public forum. Recently the paper Large Scale GAN Training for High Fidelity Natural Image Synthesis stirred a heated discussion on social media. The paper presents a Generative Adversarial Network model capable of generating very impressive images with high fidelity and low variety gap. However, some in the community are concerned that the flood of positive public comments will give the paper an advantage during review.

Is peer review creating troubling trends in ML papers?

At ICML 2018, CMU’s Dr. Lipton and Stanford PhD student Jacob Steinhardt published Troubling Trends in Machine Learning Scholarship, which takes aim at ML academic papers for their “speculation guised in explanation,” “mathiness,” and “obfuscation” The paper asks whether these problems are mitigated or exacerbated by peer review.

Some researchers argue that the peer review process is adopting bad practices and trending towards an unscientific path, and that this is negatively influencing how authors write and present their papers. Google Researcher Dr. Ian Goodfellow pointed out that reviewers tend to skip over the complicated math equations in submitted ML papers and are easily convinced by explanations of new methods regardless of their plausibility. He suggested that some authors were tailoring their papers to reviewers in order to “sneak science in the door.”

“Peer review is a good idea in principle, but it’s important to get the implementation right in practice,” Goodfellow tweeted.

Facebook Chief AI Scientist Yann LeCun meanwhile looked at the big picture, writing in his blog that “Our current system, with its emphasis on highly-selective conferences, is highly biased against innovative ideas and favors incremental tweaks on well-established methods.”

Any solutions?

A 2017 Wired magazine story revealed that Elsevier, one of the world’s largest publishers of scientific research, has developed an AI program, EVISE, designed to aid in peer review. The program can link a manuscript with plagiarism-checking software; select appropriate reviewers to avoid conflicts of interest; suggest reviewers based on content; and even send thank-you letters to reviewers. There is no evidence so far of major ML conferences adopting any such software programs.

Some ML researchers meanwhile are proposing methods to revamp publication models. LeCun introduced a stock exchange-like publication model where papers play the role of securities and reviewers are investors. All papers and reviews are open to the public. Reviewers are expected to review papers to the highest standards and make informed and insightful comments. The idea is that reviewers’ own reputations would improve along with the quality of the papers.

Professor of Information Engineering at the University of Cambridge Zoubin Ghahramani suggests conferences and journals should limit the number of submitted papers assigned to each reviewer. If a paper does not get any reviewers, its authors can withdraw it or submit it elsewhere.

It’s clear that traditional peer review solutions are lagging as ML research produces an ever-increasing number of academic papers. With NIPS approaching and a number of other major AI and ML conferences on the horizon, a new approach is needed if paper quality standards are to be maintained. Our brainy research community will just have to work together to come up with a practical and effective one.