Distribution of 465 unique Indigenous language codes in the Australian National Bibliographic Database

That was the topic discussed recently by OCLC Research Library Partners metadata managers, initiated by Stephen Hearn of the University of Minnesota, Melanie Wacker of Columbia University, and John Riemer of the University of California, Los Angeles. Historically, librarians designed authorized access points and MARC bibliographic and authority records toward use in browse indexes and card catalogs. The use of MARC data in online catalogs opened new possibilities: generating downloadable citations, hyperlinking to external resources, using ISBNs to fetch book cover images, labeled display alternatives to ISBD, automatic search redirection, current awareness services, and so on. At the same time, the trend away from browse indexes and toward keyword search has been a major disruption in the use of traditional authority data to aid search navigation. As browse indexes have been eclipsed, their use of formal patterns to ensure an overarching semantic organization of access terms has in many cases simply been lost. On the other hand, libraries may be able to learn from other community practices and new technologies to enhance our own data.

New uses of cataloging metadata: The shared and consistent use of MARC fields supports new applications. Metadata managers reported usingidentifiers in bibliographic records to fetch tables of content, abstracts, reviews, and cover images and to generate floor maps of where to locate resources in a specific classification range (such as OCLC’s integration with StackMap). Bibliographic metadata is used to populate Digital Asset Management Systems and Institutional Repositories, and with tools such as Tableau and OpenRefine, can enable a richer analysis of collections and a view of collections. MARC metadata is connecting scholars with the bibliographic data for their projects, and can generate relationships to related resources with applications such as Yewno. MARC metadata is also being used to inform institutional output measures and affiliation tracking, and serve as a source to build organization histories. The provenance implicit in an institution’s bibliographic metadata has proven helpful in documenting theft cases. Analyzing catalog data has enabled some to generate language codes missing in the MARC records. MARC data has also supported generating subject maps to discover relationships otherwise not explicit in the cataloging metadata.

One of the striking uses of visualizations is from the Auslang national codeathon held in 2019, a collaboration among the National Library of Australia, the Australian Institute of Aboriginal and Torres Strait Islander Studies, Trove, Libraries Australia, and the State and Territory libraries—a national code-a-thon to identify items in Indigenous Australian languages. The image featured above shows the results, a map indicating the 465 Indigenous languages in the Australian National Bibliographic Database tagged as a result of the code-a-thon, and an example of involving the community to enhance bibliographic metadata.

Lamentable demise of authority browse and cross-references: A dominant theme in our discussions was the inability to browse authority data as part of catalog searches and the absence of the cross-references provided in authority data. Work-arounds for this absence included embedding variant and related terms in the bibliographic data with controlled terms providing keyword access via variants; adding a search box suggester (“Did you mean…?”); leveraging the ontological structure of controlled vocabularies behind the scenes to build relationship structures into the catalog data otherwise not apparent. Authority records are indexed, but not displayed. Having alternate terms from authority data hidden from users can be a problem; in some systems, a subject search for “tree” also retrieves Thomas Hardy resources based on his name variant “Author of Under the Greenwood Tree.” Without a browse index, scope notes are also lost for disambiguating common names and terms used in different subject headings (how does a user know what keyword “Indian” means in the catalog?) As a remedy to this, OCLC’s FAST (Faceted Application of Subject Terminology) provides hyperlinks and facets that can be used for exploring related topics in cataloging records. WorldCat Discovery will start using the desired cross-references as part of searching in June 2020.

Many institutions are looking to better represent Indigenous peoples in their catalogs. University of Calgary has taken efforts to update how tribal names display but the underlying data is still there under the hood of the catalog; outdated tribe names are still returned by the catalog in cross-references. The bulk of literature related to Indigenous peoples in our catalogs are still indexed by a pejorative term, which may be mitigated by providing “see also” notes. “You do have to deal with the fact that researchers are looking for all the material, under different possible names; you’re almost forced to have to provide cross-references.”

Non-library use cases that leverage library cataloging and authority data: Libraries have been sharing their metadata with collective digital repositories such as the Hathi Trust. Library metadata is also being used to generate bibliometrics, statistical methods to analyze books, articles, and other publications. Using library metadata for Digital Humanities research projects has much potential. For example, a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright; UCLA noted that other researchers have used cataloging metadata to track the commercializing of inventions such as insulin.

UK Hatchette’s “River of Authors” generated from the British Library’s catalog metadata

A novel use of cataloging metadata was by Hachette UK, the UK’s second largest book seller, which commissioned the Graphic History Company to unlock the histories of all nine of Hachette’s publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years. The British Library provided a list of over 55,000 authors, from which 5,000 of the most prominent were selected to create perhaps the most beautiful example of metadata use: a giant mural spanning eight floors featuring all 5,000 authors in chronological order. (For more images of this mural, see Hachette’s River of Authors).

Potential to enhance library metadata with other sources: Metadata managers had a long list of non-MARC sources that could be used to enhance library bibliographic or authority data to provide context and improve access to library resources, including identifiers from VIAF (Virtual International Authority File), id.loc.gov, ISNI (International Standard Name Identifier), ORCiD (Open Researcher and Contributor Identifier), and Scopus ID; AllMusic, author and fan sites, Discogs EAC-CPF (Encoded Archival Context for Corporate Bodies, Persons, and Families), EAD (Encoded Archival Description), family trees, Geospacial data, GoodReads, IMDb (Internet Movie Database), Internet Archive, Library Thing, LinkedIn, MusicBrainz, ONIX, and Open Library. Wikidata and Wikipedia led the list. The Program for Cooperative Cataloging’s Task Group on URIs in MARC’s document, Formulating and Obtaining URIs: A Guide to Commonly Used Vocabularies and Reference Sources provides valuable guidance for collecting data from these other sources. Wikidata is viewed as an important source for expanding the language range and providing multilingual metadata. References to Wikipedia articles are being used to enrich the discovery of digital collections.

More focus on entities rather than collections of resources: Metadata managers have been frustrated by the limitations of their library systems, and especially their reliance on key word searching. We cannot leverage the value in authority data when it’s surfaced only in bibliographic record access points. We need a better way to represent the resources in our catalogs that allow looking at the entities and their relationships represented in those resources. The challenge is to present easily understandable context and semantics of the data to our user populations, a challenge that OCLC is addressing. Searching a large database of entities that link to resources related to those entities may further facilitate and expand the community’s uses of our cataloging and authority data.