Harvard Opens Up Its Massive Caselaw Access Project

from the good-to-see dept

Almost exactly three years ago, we wrote about the launch of an ambitious project by Harvard Law School to scan all federal and state court cases and get them online (for free) in a machine readable format (not just PDFs!), with open APIs for anyone to use. And, earlier this week, case.law officially launched, with 6.4 million cases, some going back as far as 1658. There are still some limitations -- some placed on the project by its funding partner, Ravel, which was acquired by LexisNexis last year (though, the structure of the deal will mean some of these restrictions will likely decrease over time).

Also, the focus right now is really on providing this setup as a tool for others to build on, rather than as a straight up interface for anyone to use. As it stands, you can either access data via the site's API, or by doing bulk downloads. Of course, the bulk downloads are, unfortunately, part of what's limited by the Ravel/LexisNexis data. Bulk downloads are available for cases in Illinois and Arkansas, but that's only because both of those states already make cases available online. Still, even with the Ravel/LexisNexis limitation, individual users can download up to 500 cases per day.

The real question is what will others build with the API. The site has launched with four sample applications that are all pretty cool.

H2O is a tool that law professors can use to easily create casebooks for students in various areas of law. Anything published on H2O gets a Creative Commons license and can then be shared widely. I wonder if professors like Eric Goldman, who offers an Internet Law Casebook, or James Grimmelmann, who has a different Internet Law Casebook, will eventually port them over to a platform like H2O.

A wordcloud app that currently shows the "most used words" in California cases in various years. Here, for example, are the word clouds in California cases from 1871... and 2012. See if you can tell which one's which.

Caselaw Limericks that appears to randomly generate what it believes is a rhyming limerick from the case law. Here's what I got:

Her son Julius is a confirmed thief.

He did not turn over a new leaf.

The vessel, not.

the parking lot.

Respondent concedes this in its brief.

The quality overall is... a bit mixed. But it's fun.

And, finally, in time for Halloween, Witchcraft in Law, which totals up cases that cite "witchcraft" by state.

Hopefully this inspires a lot more on the development side as well.

Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community. Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis. While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.

–The Techdirt Team

Filed Under: caselaw, caselaw access project, legal data, public info, public records, transparency

Companies: harvard, lexisnexis, ravel