This blog post is to introduce the first four ontologies of SPAR, the Semantic Publishing and Referencing Ontologies, an integrated ecosystem of generic ontologies shown diagrammatically in the ‘flower’ diagram below (Figure 1). The ontologies can be used either individually or in conjunction, as need dictates. Each is encoded in the Web ontology language OWL 2.0. Together, they provide the ability to describe far more than simply bibliographic entities such as books and journal articles, by enabling RDF metadata to be created to relate these entities to reference citations, to bibliographic records, to the component parts of documents, and to various aspects of the scholarly publication process.

Figure 1: The flower diagram, created by Benjamin 0’Steen, showing the component ontologies of SPAR.

The first four ontologies, FaBiO, CiTO, BiRO and C4O, which are now available for inspection, comment and use, are useful for describing bibliographic objects, bibliographic records and references, citations, citation counts, citation contexts and their relationships to relevant sections of cited papers, and the organization of bibliographic records and references into bibliographies, ordered reference lists and library catalogues.

Four additional ontologies, DoCO, PRO, PSO and PWO, are in preparation to provide structured controlled vocabularies for document components, publishing roles, publishing status and publishing workflows. A simple architectural diagram of the eight SPAR ontologies is shown in Figure 2.

Figure 2: A simple architectural diagram, created by Silvio Peroni, showing the interactions and dependencies between the component ontologies of SPAR. Four ontologies (DoCO, PRO, PSO and PWO), some of which will import FOAF (http://xmlns.com/foaf/spec/20100809.rdf), are shown with faint outline because they are still under development.

The original motivation for creating the first of these ontologies, the Citation Typing Ontology CiTO, was provided by the semantic publishing work undertaken in 2008, described in [3]. Version 1.6 of the original CiTO ontology developed from that work is described in [4].

Since that publication, as part of a harmonization activity with the SWAN ontologies (http://swan.mindinformatics.org/ontology.html) described in [5], we have separated out from CiTO those aspects describing bibliographic entities into FaBiO, the FRBR-aligned Bibliographic Ontology, and those aspects describing the quantification of citations into C4O, the Citation Counting and Context Characterization Ontology, leaving the current version of CiTO (v2.0) with the sole role of describing the nature and character of the citations themselves.

Where appropriate, the SPAR ontologies, specifically FaBiO and BiRO, the Bibliographic Reference Ontology, employ the FRBR (Functional Requirements for Bibliographic Records) classification model, a conceptual entity-relationship model developed by the International Federation of Library Associations and Institutions (IFLAI) as a “generalized view of the bibliographic universe, intended to be independent of any cataloging code or implementation” [1, 2]. FRBR distinguishes Works, Expressions, Manifestations and Items.

In FRBR, a Work is a distinct intellectual or artistic creation, an abstract concept recognized through its various expressions (for example, your latest research paper); an Expression is the specific form that a Work takes each time it is ‘realized’ in physical or electronic form (for example, as a journal article); a Manifestation of an expression of a work defines its particular physical or electronic embodiment (for example online, print or PDF); and an Item is a particular copy of that you might own (for example the print copy of a journal issue on your desk). FRBR is widely recognized as a sound fundamental model for bibliographic records, and permits a clarity of description that is lacking when using ‘flat’ ontologies and vocabularies that do not employ the FRBR data model.

While the individual ontologies will be described in greater detail in subsequent blog posts and papers, their characteristics and benefits can be summarized as follows:

SPAR, the Semantic Publishing and Referencing Ontologies

An integrated ecosystem of independent and reusable ontology modules, capable of use to create comprehensive machine-readable RDF metadata for semantic publishing and referencing, comprising FaBiO, CiTO, BiRO, C4O, DoCO, PRO, PSO and PWO.

FaBiO, the FRBR-aligned Bibliographic Ontology (version 1.0; http://purl.org/spar/fabio/)

An ontology, structured according to the FRBR data model, to permit the description of bibliographic entities.

Comprehensive coverage of publication entity types, including born-digital entities.

Imports the FRBR Core ontology.

Uses PRISM terminology.

Extends the FRBR data model by the provision of new properties linking Works and Manifestations (fabio:hasManifestation and fabio:isManifestationOf), Works and Items (fabio:hasProtrayal and fabio:isPortrayedBy), and Expressions and Items (fabio:hasRepresentation and fabio:isRepresentedBy).

Harmonized with the SWAN ontologies (http://swan.mindinformatics.org/ontology.html), with the SWAN Citations Module deprecated in favour of using FaBiO to describe bibliographic entities.

RDF mappings of BIBO classes and properties to FaBiO in preparation.

RDF mappings of BibTEX entities to FaBiO to follow.

CiTO, the Citation Typing Ontology (version 2.0; http://purl.org/spar/cito/)

An ontology to permit the characterization of the type or nature of citations, both factual (e.g. cito:citesAsMetadataDocument; cito:sharesAuthorsWith) and rhetorical (e.g. cito:confirms, cito:qualifies), able to deal with both direct and explicit, and indirect and implicit citations.

Integrated with the SWAN Scientific Discourse Relationships Module (http://swan.mindinformatics.org/spec/1.2/scientificdiscourse.html).

BiRO, the Bibliographic Reference Ontology (version 1.0; http://purl.org/spar/biro/)

An ontology, structured according to the FRBR data model, to define bibliographic records (as subclasses of frbr:Work) and bibliographic references (as subclasses of frbr:Expression), and their compilation into bibliographic collections and bibliographic lists, respectively.

Imports the FRBR Core Ontology (http://purl.org/vocab/frbr/core).

Imports the SWAN Collections Ontology (http://swan.mindinformatics.org/ontologies/1.2/collections.owl) to permit the description of ordered lists.

Provides a logical system for relating an individual bibliographic reference, such as appears in the reference list of a published article (which may lack the title of the cited article, the full names of the listed authors, or indeed the full list of authors):

to the full bibliographic record for that cited article, which in addition to missing reference fields may also include the name of the publisher, and the ISSN or ISBN of the publication; to collections of bibliographic records; and to bibliographic lists, such as reference lists and library catalogues.

Has the ability, used in conjunction with the SWAN Collections Ontology, to specify ordered lists:

of authors, of references, of all the in-text reference pointers within an article, and of those in-text reference pointers specific for a single reference.

C4O, the Citation Counting and Context Characterization Ontology (version 1.0; http://purl.org/spar/c4o/)

An ontology that permits the characterization of bibliographic citations in terms of their number and their context.

Imports BiRO, and thus indirectly imports the FRBR Core Ontology and the SWAN Collections Ontology.

Provides the ontological structures to permit the number of citations a cited entity has received globally to be recorded, as determined by a bibliographic information resource such as Google Scholar, Scopus or Web of Knowledge on a particular date.

Provides the ontological structures to permit recording of the number of in-text citations of a cited source, (i.e. number of in-text reference pointers to a single reference in the citing article’s reference list).

Enables ontological descriptions of the context within the citing document in which an in-text reference pointer appears.

Permits that context to be related to relevant textual passages in the cited document.

N.B. The following four ontologies are under development, and will be published shortly.

DoCO, the Document Components Ontology

An ontology for the characterization of the component parts of a bibliographic document.

Provides a structured vocabulary of document components (e.g. Introduction, Discussion, Acknowledgements, Reference List, Figures, Appendix) in OWL, enabling these to be described in RDF.

PRO, the Publication Roles Ontology

An ontology for the characterization of the roles of agents (people, corporate bodies and computational agents; e.g. author, editor, reviewer, publisher, librarian) in the publication process, as they relate to bibliographic entities

Permits the recording of time/date information about when roles are held.

PSO, the Publications Status Ontology

An ontology for the characterization of the status of a document and other bibliographic entities at various stages in the publication process (e.g. submitted manuscript, rejected manuscript, accepted manuscript, proof, Version of Record, catalogued book).

PWO, the Publications Workflow Ontology

An ontology for the characterization of the main stages in the workflow associated with the publication of a document (e.g. under review, XML capture, page design, publication to Web).

Further blog posts will describe each of the SPAR ontologies in greater detail, will give examples of their use in encoding bibliographic and referencing information, and will describe mapping of other bibliographic metadata systems that do not employ the FRBR data model to FaBiO, specifically of BIBO, the non-FRBR bibliographic ontology, and of BiBTEX terminologies.

We invite community feedback and engagement on the four published SPAR ontologies, their improvement and their application.

This work forms part of the JISC Open Citations Project described in this blog.

The relevant hash tags when referring to this post are #jiscopencite and #spar.

David Shotton and Silvio Peroni

University of Oxford, October 2010

References

[1] Saur KG: FRBR (Functional Requirements for Bibliographic Records) Final Report. International Federation of Library Associations and Institutions; 1998. http://www.ifla.org/files/cataloguing/frbr/frbr_2008.pdf.

[2] Tillett B: What is FRBR? A Conceptual Model for the Bibliographic Universe. Washington DC, USA: Library of Congress, Cataloguing Distribution Service; 2003. http://www.loc.gov/cds/downloads/FRBR.PDF.

[3] Shotton D, Portwin K, Klyne G, Miles A: Adventures in semantic publishing: exemplar semantic enhancements of a research article. PLoS Comput Biol 2009, 5:e1000361. http://dx.doi.org/10.1371/journal.pcbi.1000361.

[4] Shotton D: CiTO, the Citation Typing Ontology. Journal of Biomedical Semantics 2010, 1 (Suppl. 1): S6. http://dx.doi.org/10.1186/2041-1480-1-S1-S6.

[5] Ciccarese P, Shotton D, Peroni S and Clark T: CiTO + SWAN: The web semantics of bibliographic records, citations, evidence and discourse relationships. (Submitted for publication).