Know what I hate, when a governmental agency thinks you’ll totally be fine with a shitty web interface to get what your looking for. Case in point the laws of Massachusetts to view all the General laws requires viewing ~20,000 separate HTML pages.

So I did what any good hacker will do and scrapped it to JSON. I used python with Beautiful Soup for the html parsing, two things I noticed

Am I just in some weird async/javascript bubble but synchronous HTTP?!? is this how other languages do things? It ended up taking hours because python was sitting around twiddling it’s thumbs doing nothing while the stuff was downloading. Python’s lack of the dot notation for dictionaries means that beautiful soup ends up being an extremely obtuse version of jQuery. Dammit Guido I want real anonymous functions and monads!

After I scrapped it all I then posted the raw JSON to Github and then I posted it to a couch db on cloudant, you can view it’s futon and replicate it if you want.

I’ve now started building a demo app using the awesome full text search that cloudant has, that is a work in progress, and pull requests would definitely be appreciated.