Discovery patterns and practices are changing steadily, as workflows adjust to new services and work around a variety of barriers. The best data about discovery practices are held by content providers, who are able to analyze the variety of sources that researchers use to reach their platforms. But while many content providers analyze their own traffic sources, they tend not to share these data with one another or publicly, making it impossible for them to know whether their patterns are typical or particular. And while libraries may know that the use of one or another discovery service they provide is growing or shrinking, they do not have access to the black box of web and academic search engines like Google and Google Scholar, making their view especially blindered. Given the limited availability of data about actual discovery patterns, all but the most sophisticated are forced to turn to the next best thing, which is survey findings.

I run Ithaka S+R’s US Faculty Survey, and our companion partnership with Jisc and RLUK, the UK Survey of Academics, both of which touch on discovery issues and both of which will release findings from their latest cycles this spring. I know the great value of survey research and also the cost of undertaking broad sector-wide surveys with rigor, and I am always frustrated to be unable to explore a single topic such as discovery with the depth I might wish in a single survey.

It was therefore with great interest that I learned of the publication this month of Tracy Gardner’s and Simon Inger’s How Readers Discover Content in Scholarly Publications. This is one of the broadest surveys ever conducted on discovery. It is international in scope, goes beyond the academic sector to include corporate, government, and medical users, and draws on some 40,000 responses.

Before looking at the findings, I must say a few words about method. The population surveyed is derived from the contact lists of a wide group of publishers. It was drawn from lists of authors, reviewers, and society members, as well as users who registered online for alerts, for mobile device pairing, and for other purposes. This type of convenience sample approach is, by definition, not representative of any single population, and yet nevertheless the researchers normalized the responses to match the demographics of the respondents in the previous cycle of this survey. In addition, the size of the sample is not indicated, presumably because several open calls for participation were used in addition to targeted invitations, so while the response rate is estimated at between 1-3% it is impossible to establish conclusively. I am sympathetic to the tradeoffs that must sometimes be made in order to conduct research in a timely fashion and in the available budget, but each of these is a serious methodological shortcoming. To their credit, the researchers were transparent about the project limitations (and very helpful in discussing aspects of the project with me upon my inquiry). But in interpreting the findings and considering potential business responses, scholarly information professionals should practice caution.

There are findings from this survey that make good sense and deserve attention on their own merits. Perhaps the most significant of these, which is a theme explored across the report, is that discovery patterns and practices vary across different sectors such as academic, corporate, and medical, different countries and levels of national income, and different fields and disciplines. A content provider with a global footprint, or with a list that cuts across fields, therefore has a much more challenging task before them than just optimizing for a single set of needs, and some publishers are aggressively developing their channels for discoverability as a result. A library cannot just follow the “best practices” of another institution whose user base may be rather different, and some libraries that have the resources to do so are developing services designed with their own researcher population in mind. Examining which set of needs are of greatest importance for your own user communities, and how best to serve those needs, is essential for any information services organization.

At the same time, one of the most apparently important findings is simply not, in my view, credible. The report finds a significant overall increase in the publisher website as a starting point for searches for journal articles on a specific subject (figure 4). Within the academic sector, it finds the growth in importance of the publisher website as a starting point has grown substantially in every field (figure 10). These findings are contradicted internally within the report, as well as being mismatched with the reality being experienced by many publishers. Within the report, there is a finding of steady downward satisfaction with an array of publisher platform features, including “Searching” (figure 49). And among content providers, I am aware of many that have experienced an overall trend of fewer and fewer searches on-platform, with a growing share of traffic sourced from third parties such as websearch. I fear that with this particular set of findings, the convenience sample — drawn from those with an existing close connection with content providers — is skewing the results.

Search is important, of course, but it is by no means the only way that researchers discover scholarly content. The report finds that search accounts for approximately 40-45% of discovery of the last journal article the respondent accessed, a figure that varies only slightly by sector (figure 30). While the report emphasizes that “search is dominant,” for me, the headline finding here is that the other means of discovery specified — everything from personal recommendations and social media to alerts and citations — collectively add up to drive more traffic than search. This will vary quite a bit by content provider, but it emphasizes the importance of not just seeing Google Scholar as one’s discoverability strategy.

The report distinguishes firmly between discovery and delivery, even though there are notorious challenges in establishing this distinction in online workflows. We have also seen in other research the challenge in establishing for academics the distinction between free resources and those that are licensed by the university library on their behalf and made available to them at no personal cost. So, we should be cautious in responding to the report’s finding that, in high income countries, academics source less than 40% of their journal articles from paid/licensed sources and the remainder from a variety of free/open sources (figure 38). Certainly, the share of access provided elsewhere may well be growing, but the idea that institutional repositories on their own account for fully one quarter of article access is just not believable in light of the small amount of journal content and usage in many repositories.

The researchers asked respondents about “your favourite online journals,” but I wonder if this is still the right formulation. In light of changes in the discovery environment, fewer scholars are keeping up with current scholarship in their field by browsing current issues or receiving alerts regarding new journal issues. Scientists such as chemists have reported that they face a “deluge” of relevant articles, are struggling to maintain current awareness in their field, and are therefore looking for improved current awareness mechanisms, not keeping up with a selection of favorite journals. Indeed, one of the report’s most important, consistent, and credible findings is the dropoff in importance of new issue or topic alerts (figures 26, 27, 34, 35, 49).

In addition to journals, the researchers examined the most important starting point when searching for scholarly books. While they did not break the findings out just for humanists, which would have been intriguing from a university press perspective, their findings for academia overall show some meaningful differences as compared with journals. In academia, library web pages, discovery tools, and search engines barely edged out general web search engines for first position, with online bookshops such as Google Books and Amazon Kindle (not Amazon writ large) in third position (figure 31).

Ultimately, it is impressive that Gardner and Inger have taken on such a large-scale study, which adds some additional context to our understanding of discovery, especially in its diversity. In a future cycle of this project, it is my hope that some of the methodological shortcomings can be addressed so that the study can provide stronger guidance for service models and business practices.