With the health of local journalism becoming more salient, more and more researchers and scholars have turned to studying local news ecosystems and news deserts — even Facebook recently announced a new initiative.

When I was working with Phil Napoli’s News Measures Research Project back in 2015, we had no idea how prescient our research questions about the critical information needs of communities and news ecosystem analysis would become. The good news is that with so much activity in this space, the goal isn’t so much about how to invent a new method or approach, but rather, to improve them.

This is the second post in a series about a research project Jesse Holcomb and I are undertaking to map the local news ecosystems of New Jersey. This project aims to be the most comprehensive, but scalable, local news ecosystem mapping project to-date, one that can be replicated across the United States. It builds explicitly on the methodology and execution of Napoli’s News Measures Research Project, but with some important tweaks.

The aim of this series is to document our progress in this endeavor, which employs some new methodological approaches to address urgent questions about the landscape of local news in the digital age. In the first post, I discussed the balance between depth and scale in news ecosystem studies. Here I talk about the use of databases to build a “census” of local news providers for a given area.

Understanding any local news ecosystem is impossible without first understanding the depth and breadth of news production there. It sounds simple, but the reality of answering that question with data is much more complex. Quite obviously, producers of local news operate in several media: print, broadcast, and online. Failing to account for any one medium will produce an inaccurate picture of the local news landscape in question. The problem is, no comprehensive list or database exists that covers local news producers across all media.

Researchers have tackled this problem in a number of ways, using different combinations of manual and programmatic data-gathering techniques. We think the best approach to building a census of local news outlets is by triangulation, using a combination of databases that allows for comprehensiveness in terms of medium, and involves as little manual searching as possible.

As researchers have done more work in this area, several databases have risen to the top as the most reliable and comprehensive for what they do. For example, Editor & Publisher began as a newspaper database but now includes some digital-native news sites and makes a point to include ethnic media outlets as well. E&P sells access to their database of local news publishers on an annual basis for around $1,000. The list of New Jersey outlets — before cleaning — numbered 344.

Another popular database for discovering local news outlets is Cision. Cision is a “public relations and earned media software company and services provider;” it is in their interest to have as current and comprehensive a list of local news publishers as possible. However, access to this list depends on being able to pay, though many universities subscribe to Cision and this is how we obtained our lists. Cision’s list of New Jersey outlets, before cleaning, numbered 319.

BIA/Kelsey, a hybrid advertising-focused company, is another popular source for its database of local media outlets. Like Cision, it is a for-profit company that sells its lists to organizations that are able to pay top dollar, which prevents more widespread use among researchers. Unlike Cision and Editor & Publisher, BIA/Kelsey focuses equally on television and radio (in addition to print-based outlets), including maps of coverage areas. We bought access only to their television and radio lists, which for New Jersey totaled 84 outlets.

Another valuable (and free!) list is ABYZ News Links, which is of unclear purpose or origin but includes 269 local news outlets for New Jersey, in addition to similar lists for the other 49 United States and many other countries as well. We also used Michele’s List, the New Jersey Press Association, the National Newspaper Association, Onlinenewspapers.com, MondoTimes.com, Library of Congress, and LION’s list of publishers (none of which has a fee).

Every outlet in each of these lists needed to be entered into a spreadsheet, or checked against existing entries. Most lists were not available in Microsoft Excel format, so entry needed to be done manually. On the one hand this is good, as it breeds familiarity with the data; on the other it is time-intensive and tedious.

(One quick note: Major technology companies and platforms like Google, Facebook, and others have treasure troves of data about local news, including lists of publishers, their geographic locations, and news flow within communities, which until very recently have been kept behind closed doors. Recent indications of their willingness to share this data with researchers is most welcomed!)

One complicating factor for our project is the importance of out-of-state markets. New Jersey sits between two major metropolises, New York City and Philadelphia — both of which have outlets that produce local New Jersey news. Therefore, we also had to acquire the lists for New York and Pennsylvania, then manually extract the outlets that include New Jersey in their local coverage area (at the same time we tried to discern whether they actually produced local New Jersey news, but confirming this needs to wait until the content analysis).

We are nearing completion of our master list of outlets; the final N looks to be somewhere around 700. Recalling the number of outlets in each of the lists discussed above, one can see that there is not a huge amount of overlap between the lists, though our final N is higher because of the inclusion of outlets from NYC and Philadelphia.

For each outlet, Jesse and I noted every list on which it appeared. We are currently writing a paper on the use of these large databases in journalism research, for which we will analyze the comprehensiveness of each list, including any systematic blind spots (e.g. by type of outlet). Our goal is to reduce the amount of work for future researchers, who may be able to use fewer lists depending on the specific features of their project.

It’s fairly easy to envision some sort of meta-database that would dump the results of these different lists into one place, de-duplicating and updating them regularly. Doing this is currently beyond our pay grades, but it would be an invaluable resource to the journalism research community. We estimate that the acquisition and entry of the various databases we’ve used so far has taken at least 100 hours of work, probably more. By dissecting the process, we hope to establish a more streamlined method for similar efforts going forward.

Our next step in mapping the local news ecosystems of New Jersey will be to scrape the websites of the 700-some outlets that produce local NJ news and perform a content analysis to determine how much local and accountability reporting is actually produced (the “quality” piece). While this is happening, we’ll be working on a public-facing website that will contain a map of these local news ecosystems, a database of local news outlets, and other important features of the project.

Our hope is to not only better understand the local news landscape of New Jersey, but to document the process by which we do so, in part so that we can replicate this effort in other states.

Stay tuned!