For the last ten years, obsessive record collectors in Usenet have been working on the Whitburn Project — a huge undertaking to preserve and share high-quality recordings of every popular song since the 1890s. To assist their efforts, they’ve created a spreadsheet of 37,000 songs and 112 columns of raw data, including each song’s duration, beats-per-minute, songwriters, label, and week-by-week chart position. It’s 25 megs of OCD, and it’s awesome.

As far as I know, this is the first time the project and its data have ever been discussed outside of Usenet. Despite its illegality, they’ve created a wonderful resource and you can do some fun things with the data. For the next three days, I’m going to publish some analysis and insights gleaned from their work. Update: I published an entry about one-hit wonders and pop longevity.



History of the Whitburn Project

Named after Joel Whitburn and his authoritative Billboard books, the Whitburn Project began in 1998, when a group of 15 collectors pooled their resources to create an MP3 collection of every single in the top 40. They experimented with trading the files on P2P networks, but eventually landed in Usenet instead.

The Excel spreadsheets were created to help them verify their collections were complete, with new versions updated and re-uploaded to the newsgroups weekly. Later, other collectors found the spreadsheet and built tools on top of it, including a utility to rename files properly and locate missing songs.

Originally, most of the Whitburn Project was simple data entry and fact-checking, but as the project grew, it forked away from the Whitburn books. “This spreadsheet does not reflect the Whitburn information found in his books,” wrote Bullfrog, one of the spreadsheet’s maintainers. “Whitburn has changed the way he numbers the annual songs at least twice since this [spreadsheet] was created. We feel that he went off the deep end a little, so will not be following his new numbering scheme.”

They’ve also added new fields culled from their own research. “Obviously with the addition of BPM, genre, and the like,” wrote Bullfrog, “it has become its own entity and will continue to be from now on.”

Over the last few months, I’ve tried multiple times to contact the maintainers of the spreadsheet and the excellent Whitburn newsgroup FAQ, but they haven’t responded.

The Data

There are several Whitburn spreadsheets uploaded to multiple Usenet newsgroups sporadically, but the most useful is the “Billboard Pop ME (1890-2008),” which is posted in alt.binaries.sounds.whitburn.pop.

Note: This data is almost certainly a violation of Billboard’s copyright, and probably infringes on Record Research’s books too. The analysis I’m publishing here should fall under fair use, but redistributing the spreadsheet would not. If you’re brave (or dumb) enough to locate and mirror a copy of the file, leave a comment. Update: An anonymous commenter posted the spreadsheet to Rapidshare/Megaupload.

Above is a sample of the top 10 songs from 2007, so you can see the format and fields of the collected data, along with the key explaining each column. (Scroll to the right to see all the fields.)

Song Lengths Over Time

I’ll be focusing more on analysis tomorrow, but here’s one of the first questions I asked when stumbling on this spreadsheet. Are pop songs are longer or shorter now than in previous decades? A quick query reveals this chart of average playtimes per year.

Pop songs became shorter in the early 1960s, around the 2:30 mark, before rising yearly until peaking in 1992 at 4:16. Since then, pop songs have hovered around 4 minutes long.

The longest charting song of all time is Harry Chapin’s live version of “A Better Place to Be,” at an epic 9 minutes and 30 seconds. Runners-up include Guns n’ Roses’ “November Rain” (8:56), Don McLean’s “American Pie” (8:36), and a new entrant, Death Cab for Cutie’s “I Will Possess Your Heart” (8:35).

And the shortest? The Womenfolk’s cover of Malvina Reynolds’ “Little Boxes” from 1964 is only 1 minute and 3 seconds long. The shortest modern song to chart is Zac Efron’s “What I’ve Been Looking For,” the third-shortest charting song of all-time at a brief 1:19.

How about the length of the perfect pop song? For this, we can look at the mode to find the most common song lengths by decade. For example, in the 1940s, there were 42 songs that were exactly 3:01, making it the perfect song length for that decade.

1950s, 2:30 (95 songs)

1960s, 2:30 (250 songs)

1970s, 3:30 (153 songs)

1980s, 3:59 (142 songs)

1990s, 4:00 (132 songs)

2000s, 3:50 (58 songs)

I was surprised at how exact these numbers are. The capacity for 45 RPM records was about three minutes, setting the standard for pop singles well into the 1960s. By the late 1960s, those constraints were removed, and we start to see longer singles. But without artificial constraints, why did exactly four minutes become the de facto standard in the 1980s and 1990s? (Maybe Madonna knows.)

I’m tired. More analysis tomorrow, including a look at one-hit wonders and how quickly singles fall off the charts over time. Update: Here it is.