[This article was first published on, and kindly contributed to R-bloggers ]. (You can report issue about the content on this page here Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Radiohead is known for having some fairly maudlin songs, but of all of their tracks, which is the most depressing? Data scientist and R enthusiast Charlie Thompson ranked all of their tracks according to a "gloom index", and created the following chart of gloominess for each of the band's nine studio albums. (Click for the interactive version, crated with with highcharter package for R, which allows you to explore individual tracks.)

If you're familiar with the albums, this looks pretty reasonable. Radiohead's debut, "Honey Pablo" was fairly poppy musically, but contained some pretty dark lyrics (especially in the break-out hit, Creep). Their most recent album "A Moon Shaped Pool" is a fantastic listen, but it isn't exactly going to get the party started.

The "Gloom Index" charted above is a combination of three quantities, and then scaled from 1 (Radiohead's gloomiest song, True Love Waits), to 100 (the cheeriest, 15 Step).

The first quantity is Spotify's "valence score", which Spotify describes as a "quantity describing the musical positiveness of a track". Valence scores range from 0 (songs that sound depressed or angry) to 1 (songs that are happy or euphoric). Charlie extracted the list of Radiohead's 101 singles and the valence score for each from the Spotify developer API, using the httr package for R. This is useful in its own right, but several songs have the same valence score, so Charlie also looks at song lyrics to further differentiate them.

The second quantity is the percentage of words in the lyrics that that are "sad". Charlie scraped the song lyrics from Genius using the rvest package, and then used the tidytext package to break the lyrics into words, eliminate common "stop words" like 'the' and 'a', and count the number with negative sentiment.

The third quantity is the "lyrical density" (following a method described by Myles Harrison): the number of words per second, easily calculated from the Spotify track length data and the word counts calculated in the prior step.

The three quantities are combined together to create the "gloom index" as follows:

$$ \mathrm{gloomIndex} = \frac{1-\mathrm{valence}}{2} + \frac{\mathrm{pctSad}*(1+\mathrm{lyricalDensity})}{2} $$

Roughly, this is the average of the valence score and (almost) the number of sad words per second. (I'm guessing Charlie adds 1 to the lyrical density to get the two terms to about the same scale, so that both have about equal weight.)

It would be interesting to compare the "Gloom Index" for Radiohead with that for other famously downbeat artists (say, Bon Iver or Low). You'd need to so away with scaling the Gloom Index from 1 to 100 as Charlie has done here, but the formula could easily be adapted to make a universal score. If you'd like to give it a try, all of the R code is included in Charlie's blog post, linked below.

RCharlie: fitteR happieR