

Citations are the backbone of scholarly knowledge. They help researchers verify information, build on the existing knowledge we already know, and generate opportunity for new discoveries.

Citations are not only relevant to academia. They are the foundation for how we know what we know.

Until recently, the idea of creating a freely accessible repository of open citation data—i.e. data representing how scholarly works cite each other—has been hampered by restrictive and inconsistent licenses and by the lack of machine-readable reference data.

Today, we are proud to announce a key milestone toward unlocking the potential for open citation data.

———

The Wikimedia Foundation, in collaboration with 29 publishers and a network of organizations, including the Public Library of Science (PLOS), the Internet Archive, Mozilla, the Bill & Melinda Gates Foundation, the Wellcome Trust, and many others, announced the Initiative for Open Citations (I4OC), which aims to make citation data freely available for anyone to access.

Scholarly publishers deposit the bibliographic record and raw metadata for their publications to Crossref. Thanks to a growing list of publishers participating in I4OC, reference metadata for nearly 15 million scholarly papers in Crossref’s database will become available to the public without copyright restriction.1 This data includes bibliographic information (like the title of a paper, its author(s), and publication date), machine readable identifiers like DOIs (Digital Object Identifier, a common way to identify scholarly works), as well as data on how papers reference one another. It will help draw connections within scientific research, find and surface relevant information, and enrich knowledge in places like Wikipedia and Wikidata.

Unlike scholarly articles, citation data are not subject to copyright in the same way that articles themselves may be. Citation data typically rest in the public domain — free for anyone to access. Until recently, however, much of the citation data in the scientific research world has been difficult to find, surface, and access. “It is a scandal,” wrote David Shotton in Nature in 2013, “that reference lists from journal articles—core elements of scholarly communication that permit the attribution of credit and integrate our independent research endeavours—are not readily and freely available.”

Before the I4OC started, publishers releasing references in the open accounted for just 1% of the publications registered with Crossref. As of the launch of the I4OC initiative, more than 40% of this data has become freely available.

As of March 2017, the fraction of publications with open references has grown from 1% to more than 40% out of the nearly 35 million articles with references deposited with Crossref (to date). Image by Dario Tarborelli, public domain/CC0.

Like sources cited within a Wikipedia article, references cited within a scholarly article can help build powerful discovery tools and a stronger foundation for open knowledge.

Volunteer contributors and software developers in the Wikimedia movement have been curating and incorporating scholarly citations into the Wikimedia projects for quite some time. The GeneWiki project has been linking reference sources to information about genes, proteins, and diseases in Wikipedia and Wikidata. Initiatives like WikiCite aim to create a bibliographic database in Wikidata to serve all Wikimedia projects. The LibraryBase project is building tools to better understand how information in Wikipedia is referenced and guide how editors identify and use references on Wikipedia. The WikiFactMine project is helping connect Wikidata statements in the field of biomedical sciences to scholarly literature. Programmatic initiatives such as 1lib1ref are engaging librarians to add missing citations to Wikipedia, and services like Citoid are simplifying the discoverability and creation of citations for free knowledge.

These projects depend on the availability of open bibliographic and citation data. We expect I4OC will substantially contribute to all these initiatives.

Example of a partial citation graph for Laemmli (1970), one the most cited scholarly journal articles of all time. Graph generated from open citation data in Wikidata via a SPARQL query. Image by Dario Taraborelli, public domain/CC0.

Over the coming months, the organizations involved in I4OC will be working with different stakeholders to raise awareness of the availability of open citation data and evaluate how it can be reused, analyzed, and built upon. We will provide regular updates on the growth of the public citations corpus, how the data is being used, additional stakeholders and participating publishers, and new services that are being developed.

Any publisher can freely license and share their reference data by enabling reference distribution via Crossref. For more information and details on how to get involved, please visit the I4OC website: https://i4oc.org or follow @i4oc_org on Twitter.

A joint press release about the announcement is available on the I4OC website.

Dario Taraborelli, Director, Head of Research, Wikimedia Foundation

Jonathan Dugan, WikiCite organizing committee

[1] As of March 2017, nearly 35 million articles with references have been registered with Crossref. Citation data from the Crossref REST API will be made available shortly after the announcement.

Founders

OpenCitations

Wikimedia Foundation

PLOS

eLife

DataCite

Centre for Culture and Technology, Curtin University

Participating publishers

American Geophysical Union

Association for Computing Machinery

BMJ

Co-Action Publishing

Cambridge University Press

Cold Spring Harbor Laboratory Press

Copernicus GmbH

eLife

EMBO Press

Faculty of 1000, Ltd.

Frontiers Media SA

Geological Society of London

Hamad bin Khalifa University Press (HBKU Press)

Hindawi

International Union of Crystallography

Leibniz Institute for Psychology Information

MIT Press

PeerJ

Pensoft Publishers

Portland Press

Public Library of Science

Royal Society of Chemistry

SAGE Publishing

Springer Nature

Taylor & Francis Group

The Rockefeller University Press

The Royal Society

Ubiquity Press, Ltd.

Wiley

Stakeholders

Alfred P. Sloan Foundation

Altmetric

Association of Research Libraries

Authorea

Bill & Melinda Gates Foundation

California Digital Library

Center for Open Science

Coko Foundation

Confederation of Open Access Repositories

ContentMine

Data Carpentry

Dataverse

dblp: computer science bibliography

Department of Computer Science and Engineering, University of Bologna

Dryad

Figshare

Hypothes.is

ImpactStory

Internet Archive

Knowledge Lab

Max Planck Digital Library

Mozilla

Open Knowledge International

OpenAIRE

Overleaf

Project Jupyter

rOpenSci

Science Sandbox

Wellcome Trust

Wiki Education Foundation

Wikimedia Deutschland

Wikimedia UK

Zotero