Sunlight has been interested in the federal rule-making process for quite a while: we sponsored the app contest that lead to the current incarnation of federalregister.gov, which lists federal regulations as they are published, and kick-started an effort to map regulations to the laws that authorize them during a hackathon late last year. We also have extensive experience in the analysis of corporate influence on the political process, having launched several prominent influence-related projects under the Influence Explorer banner. During the last year, we’ve begun to examine the confluence of these two interest areas: corporate influence on the regulatory process, and, in particular, the comments individuals and corporations can file with federal agencies about proposed federal regulations. The first glimpses of the results of this effort went live on Influence Explorer last fall, with the addition of regulatory comment summaries to corporations’ profile pages.

Given this history, we’ve been excited to explore this week’s relaunch of regulations.gov, the federal government’s primary repository of regulatory comments, and the source of the data that powers our aforementioned Influence Explorer regulatory content. This new release brings with it a much-needed visual spruce-up, as well as improved navigation and documentation to help new users find and follow regulatory content, and a suite of social media offerings that have the potential to expose rule-making to new audiences. There have also been some improvements to document metadata, such as the addition of category information visitors can use to filter searches by industry, or browse rule-makings topically from the homepage.

Of more interest to us as web developers is the addition, for the first time, of official APIs to allow programmatic access to regulatory data. It’s clear that the regulations.gov team has taken note of current best practices with respect to open data APIs, and have produced clean, RESTful endpoints that allow straightforward access to what is, especially for a first release, a reasonably comprehensive subset of the data made available through the general end-user web interface. While we have been successful in performing significant regulatory analysis absent these tools, our work required substantial effort in screen-scraping and reverse engineering, and we expect that other organizations hoping to engage in regulatory comment analysis will now be able to do so without the level of technical investment we’ve had to make.

Of course, there is still work to be done. Much of the work we’ve done so far on regulations, and that we hope still to do, revolves around analysis of the actual text of the comments posted to regulations.gov (which can take the form of PDFs and other not-easily-machine-readable documents), and depends on being able to aggregate results over the entirety of the data, or at least significant subsets of it. As a result, even with these new APIs, we’ll still need to make large numbers of requests to identify new documents, enumerate all of the downloadable attachments for each one, download these attachments one at a time, and maintain all of the machinery necessary to do our own extraction of text from them. While we’re fortunate to have the resources to do this ourselves, and have made headway in making the fruits of our labors available for the public, it would certainly behoove the regulations.gov team to move forward with bulk data offerings of their own. Sunlight has a long history of advocating the release of bulk data in addition to (and perhaps even before) APIs, and the regulatory field illustrates many of our typical arguments for that position; the kinds of questions that can be answered with all of the data are fundamentally different than those that can be answered with any individual piece. We recognize that offering all of the PDFs, Word documents, etc., to the public might be cost-prohibitive from a bandwidth point of view, but regulations.gov is doing text extraction of their own (it powers the full-text search capabilities that the site provides), and offering bulk access to the extracted text as we have done could provide a happy medium that would facilitate many applications and analyses without breaking the bandwidth bank.

In general, we see plenty of reasons to applaud this release and the team at EPA that’s behind it. While many of its changes are cosmetic and additional improvements will be necessary for regulations.gov to reach its full potential, this update promises further progress that will benefit developers and members of the public alike. We share the enthusiasm of the regulations.gov team for increasing access to and awareness of these crucial artifacts of the democratic process, and look forward to engaging with them and the broader open government community as they continue to improve this public resource.