Girl Talk’s Feed the Animals is one of my favorite albums this year, a hyperactive mish-mash sampling hundreds of songs from the last 45 years of popular music. Gregg Gillis created a beautiful, illegal mess of copyright clearance hell, which you should download immediately. (It’s free, but I kicked in $20 for Gregg’s legal fund and a copy of the CD.)

Last month, Rex Sorgatz asked about collecting metadata on the album for data crunching. After spelunking through Billboard’s chart history, that sounded like my idea of a good time.

So I compiled all the data into spreadsheets, used Amazon’s Mechanical Turk to collect some additional information, and pulled out a few charts. As always, I’ve provided CSV downloads for all the data along with the original output from Mechanical Turk, for those interested in experimenting with the platform.

Update (October 30): Here’s the official sample list.



Results

Here’s the final spreadsheet with all the collected data. You can download the CSV or browse it using Google Spreadsheets. For more information about how the data was collected with Wikipedia and Amazon’s Mechanical Turk, I wrote about my methodology in the next section.

There are 14 tracks on Feed the Animals, with a total of 264 sampled songs. “What It’s All About” and “Like This” have 26 sampled songs each, tying for the most, while “Don’t Stop” has the fewest at 11 songs. Overall, the album averages 19.8 songs sampled per track.

The timeline below shows where each sample was triggered across the entire album, as a percentage of the song’s duration. (For example, a marker at the 50% mark on the 9th line means that a sample started halfway through track #9, “Hands In the Air.”) You can get a sense of the flow of the album, how Gregg spaces samples apart and occasionally switches moods entirely by introducing three samples in quick succession.

Using the sample release dates collected from Mechanical Turk, the chart below shows the median sample age for each track. (The bars above and below each point represent the earliest and latest years for each track.) I was surprised to see a trend — the album uses relatively recent songs for the first three tracks, before taking us back to the late ’80s and early ’90s for the middle of the album, with the exception of “No Pause.” Then, every song from track 9 to the end of the album gets progressively more modern. For the whole album, 1995 was the median year.

The chart below shows the sample release years in more detail, telling another story. Here, we can see how heavily Gregg uses samples from the last three years, and strongly avoids samples for the previous three-year period from 2001 to 2004. (Too old to be cool, but not old enough to be retro?)

I’m sure there’s more that can be explored here, so feel free to send on your own analysis.

Methodology

Getting the sample list was easy. I took a snapshot of the album’s Wikipedia entry and extracted all the samples using Excel’s Text to Columns feature.

Now, I had a spreadsheet of all 264 songs sampled across 14 tracks, with each sample’s original artist and song name. But to get the sample’s release year, I’d need to go elsewhere. The Last.fm and Yahoo! Music APIs all support album release dates, but during testing, I found that the dates were unreliable. (Compilation albums and reissues led to incorrect dates, and some artist/song searches led to incorrect results.)

Instead, I decided to use human labor to fill in the gaps using Amazon’s Mechanical Turk. I created a new request using the new web-based tools for generating HITs (or “Human Intelligence Tasks”) from a simple spreadsheet.

I paid $0.02 for each request, with each song verified by two different workers. Each worker was asked to search for the song on Billboard.com, All Music Guide, Wikipedia, or Google, and fill in the original release year. Here’s an example of one of the requests.

Within an hour, all but 4 answers were submitted. The median time to finish a request was an impressive 26 seconds. (Amazingly, over 110 answers were completed in under 10 seconds without any errors.)

For 193 songs, about 73%, the two workers agreed on the year, so were approved immediately. For the rest, 27% of the songs, the workers came up with different answers, so I checked them manually. (In hindsight, I should have required three workers per song to resolve different answers.)

Surprisingly, I couldn’t find a correlation between the amount of time spent on each task and the error rate. Workers who made mistakes took just as long as the accurate workers.

The spreadsheet below is the source data from Amazon’s Mechanical Turk. (View it on Google Docs or download it in Excel format.) The “raw” sheet is the default output from Amazon, while the rest of the sheets are my own edits, breaking out the final set of accepted answers, the responses that were immediately approved, and the ones that were contested.

Overall, it cost me $13.20 for all 528 answers and took a little over two hours, an hourly rate of about $1.64. Simple to use, affordable, and I’ll almost certainly use it again — for something a little more interesting next time.

If anyone out there wants to take a pass at getting the sample endings, sample genres, or any other additional metadata with Mechanical Turk or otherwise, send it along and I’ll add it to the spreadsheet. Thanks!

Update: If you’re in the San Francisco Bay Area, you might want to wrangle an invite to Yahoo!’s Open Hack Day in Sunnyvale tomorrow. Hint, hint.

October 30: Here’s the official sample list.