[log in]

[edit] [history]

This is a site for large data sets and the people who love them : the scrapers and crawlers who collect them, the academics and geeks who process them, the designers and artists who visualize them. It's a place where they can exchange tips and tricks, develop and share tools together, and begin to integrate their particular projects.

How you can help . Searching for some interesting data to look at? We'd love your help visualizing these data sets.

Tools of the trade . What software do you turn to when you want to make sense of the data?

Tools of the trade . Tell us about the things you found to help you solve your problems.

Tips and tricks . You know how to make sense of this stuff -- share your techniques in our wiki.

Tips and tricks . Share tricks for extracting data from those who don't want to give it up.

Mailing list . Join it today! There you can swap tips, questions, and success stories with others who are trying to get big data sets.

site mailing list • related projects

The bigger picture:

Some of us have spent years scraping news sites. Others have spent them downloading government data. Others have spent them grabbing catalog records for books. And each time, in each community, we reinvent the same things over and over again: scripts for doing crawls and notifying us when things are wrong, parsers for converting the data to RDF and XML, visualizers for plotting it on graphs and charts.

It's time to start sharing our knowledge and our tools. But more than that, it's time for us to start building a bigger picture together. To write robust crawl harnesses that deal gracefully with errors and notify us when a regexp breaks. To start converting things into common formats and making links between data sets. To build visualizers that will plot numbers on graphs or points on maps, no matter what the source of the input.

We've all been helping to build a Web of data for years now. It's time we acknowledge that and start doing it together.