Making science accessible for all is a wonderful and enriching proposal. Unfortunately the execution is not that easy — and nobody knows this better than the team that pioneered arXiv, the world’s largest free scientific paper repository.

Launched in 1991, arXiv has become an indispensable platform providing free and open access to research for the machine learning community and beyond. Now, arXiv has announced plans to alpha test its next-generation “arXiv-NG” submission system in the first quarter of 2020. The system is a significant part of the growing arXiv-NG initiative that aims to improve core service infrastructure through an incremental and modular renewal of the existing arXiv system.

The arXiv team has already taken the initial steps to improve the overall accessibility of the repository’s user interfaces, both through behind-the-scenes structural improvements and user-facing changes — adding for example support for mobile-friendly abstract pages.

ArXiv is a private and not-for-profit educational institution owned and operated by Cornell University with funding from Cornell, the Simons Foundation and other member institutions. In 2019, arXiv received 155,866 paper submissions — an 11 percent increase from 2018 — and saw about 260 million global downloads.

Despite its relatively small target audience, scientific publishing is a very lucrative business dominated by a tight circle of subscription-based journals. The Guardian reported in 2017 the industry generated total global revenue of more than £19 billion (US$24.3 billion). For years, publishing in a respected scientific journal was the only way for scientists to get their new research out there — but many have grown increasingly critical of a system they believes both profits from and restricts access to their work.

The machine learning community is the forefront of an ongoing movement for free and open access publishing. Last March the University of California system halted all further subscriptions with Elsevier, one of the world’s largest scholarly publishers, after failing to reach an agreement on securing universal open access for UC research on the platform. The move was applauded by Turing Award winner and Facebook Chief AI Scientist Yann LeCun on Facebook.

ArXiv’s expenses for 2019 totalled only around US$2 million — a ridiculously low price for the value. But is such an open-access publishing scheme sustainable? A tweet from University of Glasgow human-computer interaction lecturer and VP of Publications at ACM SIGCHI Julie Williamson warns the costs per paper may escalate as arXiv enters this new NG phase.

Williamson recently published an article on the ACM SIGCHI website that looks into the economics of open access publishing. The ACM SIGCHI (Association for Computing Machinery’s Special Interest Group on Computer Human Interaction) is an international society for professionals, academics and students interested in human-computer interaction.

Many of the papers published by non-profit organizations like the ACM and IEEE are put behind paywalls. Williams says the ACM could transition to universal gold open access and make all content in the ACM Digital Library freely available, but certain tradeoffs would have to be made in order to do that.

University of San Francisco research scientist Jeremy Howard also tweeted on the topic — noting that while arXiv handles its high submissions and downloads for just a couple of million dollars, the ACM spends $10 million on publications, and IEEE spends $193 million — “something stinks.”