Making Libraries Visible on the Web | The Digital Shift

In Library: An Unquiet History, historian and curatorial fellow for Harvard’s metaLAB Matthew Battles describes Melvil Dewey’s impatience with inefficiency in library work in the 1870s. “To Dewey, local interests and special needs were less important than the efficient movement of books into the hands of readers,” he writes. That crisp statement of purpose should be an inspiration to the current discussions around making library collections and programs visible and available on the web.

In Library: An Unquiet History, historian and curatorial fellow for Harvard’s metaLAB Matthew Battles describes Melvil Dewey’s impatience with inefficiency in library work in the 1870s. “To Dewey, local interests and special needs were less important than the efficient movement of books into the hands of readers,” he writes.

That crisp statement of purpose should be an inspiration to the current discussions around making library collections and programs visible and available on the web.

A visitor to Libraryland looking at project websites, reading journals, and listening to conference presentations might think that our only goal is to build new databases that use RDF triples and semantic web ontologies and express our frustration with some guy named Marc. There is clear passion for change, but we’d have to explain to our visitor that all of this technical activity has a genuine outcome in mind: to connect readers to the wide variety of collections and services that libraries offer, even when the user is starting from a search engine on the open web. The work to replace outdated methods of managing library metadata will allow more readers to connect to library collections and services more often, wherever they are searching.

Rachel Fewell, central library administrator for the Denver Public Library, describes it this way: “We are in an in-between world where we have two groups of people: [the] ones who already go to the library and the ones who never think about the library.”

Making libraries more visible on the web has two benefits: improving the service for the ones who are already committed to the library—they use search engines, too—and giving libraries the opportunity to reach those who never—or only sometimes—think about the library.

Declaring the outcomes we want matters, because the stakes are high. Commercial search engines have been wildly successful at connecting searchers to content because they have prioritized convenience, ease of use, and relevance of results over uniformity of presentation. Libraries now have an opportunity to understand the rules of the web and to integrate their collections and services into these convenient interfaces. Failing to do so would be a missed opportunity and a risk to the relevance of libraries.

To rise to the challenge, libraries must be explicit as to how the convenience of the user is paramount and following the rules of the web is critical to reaching people in this in-between world.

How did we get here?

How libraries drifted from a focus on the convenience of the reader can be traced through the development of library catalogs and inventory systems. Cataloging historian Dorothy May Norris tells us that the first known library catalog was written directly on the walls of the library of Edfu in Upper Egypt between 237 and 57 BCE. It was “just a bare list of books.” While the ongoing maintenance of a painted catalog doesn’t fit with today’s rapidly changing collections, if the goal is to broadcast the contents of the holdings to readers in the library, it works; the library’s assets are right there on the wall.

From there, catalogs evolved into codices: hand-crafted volumes that recorded collections for the librarian and the select few who had access to the codex. The early years of the modern era brought the card catalog, which first appeared in the 1790s in Vienna. In the 19th century, Dewey made a number of recommendations toward the standardization of library equipment and internal processes, all with the goal of more efficiently getting books into the hands of the reader. The 20th century brought mainframe computers that could store the details of millions of books and journals, increase the efficiency of library transactions, and reduce the cost of sharing cataloging data. So began the focus on the efficiency of library inventory management and catalog maintenance. The literature of library science shifted to information studies and library operations, and the language of librarianship shifted to training library users how to use these new systems.

How the web works

In the late 20th century, search engines appeared on the scene. These were built by technology companies totally focused on the convenience of searchers as a means to reach their business goals. They revolutionized information retrieval by exclusively addressing what the searcher wants: highly relevant information on the first screen of results. Over time, those search engines have developed preferred methods for the presentation of data on the web to make it available for harvesting and indexing. While libraries were early adopters of web-based catalogs, they have not widely adopted the preferred methods for allowing their data to be consumed by search engines and ranked in search results.

The good news for libraries is that the technology for integrating with search engines is not a secret art only accessible to businesses with large technology budgets. Because it improves their results, search engine companies do not keep the best practices of website design and data structure a secret. Of course, they don’t share the precise algorithms they use—those are trade secrets that distinguish one search engine from another—but they are clear about how they want websites to be structured and what they value in the data setup and content. Let’s take Google Search as an example. It has three areas of search results: traditional results, the list of web pages and documents that it is famous for; the Knowledge Card, which includes information about people, books, places, and events such as sports or movie times; and sponsored links. Each of these has rules that must be followed for prominent placement in results.

1

2

3

The most important of the dozens of best practices are to keep your website unblocked from crawlers, include video and pictures, maintain high word frequency for keywords, put keywords in the URL, maintain a high degree of adjacency of the keywords, use synonyms for the keywords throughout the page, and manage the overall quality of the pages, including frequent updates. Provide a page for each element of inventory (titles, services, and programs for libraries) that has a durable link structure that can be used by other websites as a reference. Finally, PageRank matters—that’s where the number of times a page is linked to from another page is counted, following the principle that if something is referred to more than something else it is probably more relevant. When calculating relevance, search engines look at all of these rules. The sites and pages that follow the rules most closely are rewarded with better positions in the results.Follow the rules of the semantic web. Use universal identifiers for “things” like people, works, objects, and places. Provide complete descriptions of library locations, branches, and services in the structured data stores that the search engines draw from—think Wikidata. Richard Wallis, an expert on the semantic web for libraries, gives this advice: “To get your content into the Knowledge Card, semantic properties will prove more fruitful and effective than simple words.” That means using linked data and use the same HTTP URIs (uniform resource identifiers) that are used in other libraries and on the web.Payment for placement is the rule here. Payment is a legitimate strategy in content marketing for commercial and noncommercial organizations—including libraries.

The world of commercial websites is well aware of the rules described above and significant energy goes into management of the structure of commercial websites and the presentation of their data. Libraries have the opportunity to improve their relevance on the web by observing those rules, monitoring the capabilities and compliance of their web-based catalogs, and finding all of the opportunities they can to use services that follow the rules.

What’s going on today?

Libraries have not been ignoring their generally poor results in search engine relevance, but much of the energy goes into discussions of changing the model for storing bibliographic data—that is, replacing the MARC21 standard for data exchange with something more web-friendly and transforming metadata operations into processes that convert existing MARC21 data or create new data in new models.

The Library of Congress (LC) has spearheaded that effort with the Bibliographic Framework Initiative or BIBFRAME. Beacher Wiggins, the director for acquisitions and bibliographic access at LC, says web visibility is “one of the topmost desires of BIBFRAME.”

Linked Data for Libraries, an Andrew W. Mellon Foundation–funded project of the top research libraries in the United States, has taken on the task of developing a shared database of BIBFRAME data to experiment with entirely BIBFRAME-based cataloging work flows. Phillip Schreur of Stanford University, one of the project leaders, says, “MARC was designed to represent cards, not web pages” and “in the future we’ll be working on the web.”

Service organizations have also made contributions. OCLC Research has used its big data tools and scientists to experiment with creating linked data representing works and persons mined from the massive WorldCat and Virtual International Authority File datasets. These trials could prove useful in the effort to use universal identifiers across all libraries.

The commercial firm Zepheira has taken a more straightforward approach to improve library visibility on the web. Its founder and president Eric Miller says, “The promise of moving library assets to become visible on the web is exciting. It is also a move that will be most successful with planning and a comprehensive view of the library’s assets: service locations, books, articles, events, programs, services, and people.”

To do this, the company is offering a service called the Library.Link Network (see Library.Link Builds Open Web Visibility), which converts traditional library data about all of the library’s collections and services into what Zepheira’s Jeff Penka calls “the meaningful vocabularies on the web.” It does this on a web infrastructure that surfaces the data in structures that search engines understand and can crawl. Zepheira’s goal is to create a network that allows a library to follow the rules of the web without having to manage the technical infrastructure itself.

Ebook and audiobook provider OverDrive has had tremendous success making library fulfillment options available in the search engine Bing’s version of Knowledge Card. OverDrive CEO Steve Potash refers to visibility on the web as “content marketing for libraries,” and he rates it a highly effective technique for the promotion of library ­services.

OverDrive has seen its success by following the rules and assiduously observing the “fabric and tools of the web.” This is in contrast to an approach that relies on standards created within the library field. Its goal is product promotion for OverDrive, but the result is a channel for the visibility of libraries on the web. Potash considers it a “working relationship that enables the sharing, reading, sampling, and ultimately marketing of library content to effectively promote library services at scale.”

What’s next?

Most of the discussion around making library collections and services more available on the web has been dominated by exhortations to replace MARC21 or create new data in new models such as BIBFRAME. These are important tactics, but the discussions rarely refer to an outcome: improvement to the perceived value of libraries and the efficient movement of books and services into the hands of readers. Failing to declare the desired outcome risks the danger of focusing on process and technical detail over long-term goals.

The initiatives that are building the future of libraries on the web should be clear in relation to the outcomes we want, optimistic about our opportunities, and cognizant that following the rules of the web will reap big rewards.

Ted Fons is a principal consultant for Third Chapter Partners. A librarian for more than 20 years, he was Executive Director, Data Services & WorldCat Quality, OCLC, and Director, Customer Services, Innovative Interfaces