In the early 1990s, rap took over the radio: Songs by Snoop Dogg and Jay Z played everywhere. Was this a musical revolution or merely the result of a gradual change in tastes over time? Researchers say they’re now able to answer such questions, thanks to the largest data-driven study of pop music ever undertaken. Applying evolutionary theory to this data set, they say, could settle several debates that have raged over pop music for decades.

Among art forms, music seems particularly well suited to data-driven analysis. After all, the features that distinguish one piece of music from another—rhythm, harmony, melody—are intrinsically mathematical. Researchers have long wanted to study the evolution of music with the same rigorous tools that biologists use to study the evolution of species. But tangled copyright protections make it difficult to access musical data sets on a large scale, because even data mining of musical recordings may not be allowed without permission. And determining a representative sample of musical culture for a particular place and time is challenging. For example, even though we have millions of musical scores from the Baroque era, we have no idea how often any of them were actually performed.

To solve the sampling problem, a team led by Matthias Mauch, a computer scientist at the Queen Mary University of London, turned to the U.S. Billboard Hot 100, the American music industry's weekly list of popular singles. The researchers scraped data from the Billboard website, collecting the titles and artists for some 17,000 songs that made the list between 1960 and 2010.

Getting the actual song recordings was the trickier problem. Luckily, Mauch used to work at the British online music recommendation service Last.fm, and he knew the company had a vast database of 30-second music samples that it used to preview its wares. Those samples turned out to be large enough to power a massive comparative analysis.

Rather than relying on human judgment to compare songs, the team used a statistical technique that extracts features of the recordings like timbre and harmony before grouping them into clusters. To make sure the clusters were meaningful, the researchers compared them with groups of songs created by Last.fm’s millions of users. For example, users put songs by Snoop Dogg, Ludacris, and Jay Z together in the “rap and hip hop” category. But using just the timbre and harmony features, the computer clustered them into nearly identical groups. Encouraged by the similarity, the team then analyzed these massive data with an evolutionary analysis, treating the statistical traits shared among songs like biological traits.

Far from becoming formulaic and homogeneous, as some critics have argued, pop music is as diverse now as it has ever been, the researchers found. And it didn’t evolve gradually. Instead, the analysis revealed several dramatic revolutions. The first was in 1964 during the rise of rock and soul music, when bands such as the Beatles drew huge crowds. The next started in 1983 with disco, new wave, and hard rock. And the most recent, and by far the most transformative, started in 1991 with the explosion in rap and hip hop. As Mauch and his team conclude today in Royal Society Open Science, rap is "the single most important event that has shaped the musical structure of the American charts over the past 50 years." Its powerful influence on the structure of pop music continues today; the occasional rap interlude now finds its way into many a rock song, for example.

"This is rigorous," says Jean-Baptiste Michel, a data scientist at Harvard University and Palantir Technologies, which is headquartered in Palo Alto, California, who was lead author of a 2010 Science paper that kicked off the study of culture through massive data sets. "More researchers need to take this approach." One of the findings that stands out, he says, is that pop music shows a pattern from biological evolution known as punctuated equilibrium, in which periods of gradual change are separated by explosions of complexity. The most famous example in geological history is the Cambrian explosion, a sudden, massive increase in biodiversity in the fossil record 542 million years ago. "There are differences, of course," he says, "since biological evolution has the direct parent-offspring relationship, and we don't know the mechanisms even in biology. So we have to be careful."

(Credit for linked PDF: M. Mauch et al., Royal Society Open Science [2015])