Crosswalks

To convert one data format (or scheme) to another, a crosswalk is used. A crosswalk is a set of mappings between equivalent properties and entry types in different formats (Pierre & LaPlant, 1998). First of all, for most properties a simple mapping suffices: title in BibTeX refers to the same concept as it does in BibJSON and CSL-JSON. On top of that, the property coincides with TI in RIS. This mapping could come in the form of a JSON Linked Data (JSON-LD) context, as done by CodeMeta (Jones et al., 2017), as eXtensible Stylesheet Language Transformations (XSLT), as discussed by Godby, Smith & Childress (2003).

Second, some mappings are context-dependent. For example, consider the CSL-JSON properties author and reviewed-author in relation to the RIS properties author (AU) and reviewer (usually C4). Normally, AU maps to author. However, if the entry being converted has the review entry type, AU maps to reviewed-author while author maps to C4.

Third, the data format of the values needs to be converted. While title in BibTeX can have formatting in the form of TeX, title in CSL-JSON uses a subset of HTML for formatting, and TI in RIS does not have formatting at all. Properties can also have different data types. In CSL-JSON, author is a list of objects, while authors in BibTeX, which describes the same concept, is serialized text delimited by ” and ”.

Finally, there might not be a one-to-one mapping between properties. For instance, page in CSL-JSON maps to both start and end in Citation File Format (CFF) (Druskat et al., 2018). Similarly, page, issue, volume and ISSN are all top-level properties in CSL-JSON, while the corresponding properties in Wikidata are proposed to be nested in the journal property.

Since the last two aspects can lead to information loss, crosswalks often need to be one-directional converters between two formats. To not have to create crosswalks between every possible combination of supported formats, one could define a central format, similar to the “interoperable core” in the ”long translation path” proposed by Godby, Smith & Childress (2003). It is however important that the central format can hold as much information as should be represented in any of the output formats, as to prevent information loss when converting between two formats.