by Valentin, January 18 2016, in random

When I switched to Linux last summer, the only software non compatible with Linux that I couldn't do without was foobar2000. I make it work using Wine. There are several reasons I consider this audio player and library manager to be superior to many others. To begin, its GUI is highly customizable via the so-called "Layout Editing Mode". There are a fair number of useful widgets that you can choose from and place on the screen, using a system of vertical and horizontal splits that you can add recursively.

Another good feature are the "Global Hotkeys", which make it possible to map keyboard shortcuts to various commands (pause, next song, change volume, switch the playlist order from random to default, etc), and these shortcuts work even if the player hasn't the focus. It's particularly useful when playing video games while listening to music.

Finally, plugins can be installed, some of which are powerful. My favorite is "Playback Statistics" which keeps track of the time a song has been added to the library, the first time and last time it has been played, and its play count.

Using this plugin, I like to compute my own little charts and see statistics about the music I listen to. However, exporting the data to a programmatically exploitable format wasn't as easy as I thought.

The ID mystery

Playback Statistics (PS) has a feature that allow one to "export data" (resp. "import data"). The intent of this feature is to make it possible to share statistics across different foobar2000 installations. The exported file is an XML file containing one node per song in the library with an attribute for each piece of data. Here's an example of a node corresponding to one song:

<Entry ID= "eef3112353c1512d" Count= "5" FirstPlayed= "130818970132330604" LastPlayed= "130913311506349590" Added= "130739450382613924" />

The timestamps for the last three attributes are in the Microsoft FILETIME format. It's the number of 100-nanoseconds since January 1 1601. Don't ask me.

The problem is that the song is identified by an ID and nobody knows how it's computed from the song. The PS documentation implies that the ID is computed from the following song's metadata: artist, title, album, disc number, track number. However, from GitHub to Hydrogen Audio, nobody succeeded in guessing how the ID is computed from a song.

I tried to break the mysterious hash myself without success. I did two successive exports separated by a specific song's listening session. In the two XML files, I spotted the only element that had a PlayCount increment and deduced that it was the song I'd listened to. Note that the order of the songs aren't always the same, so I couldn't perform a simple diff and had to actually parse the XML to spot the difference. Having linked one song to its ID, I then tried to brute-force the ID. I tried all the combinations of the metadata said to be used in the documentation, with various separators between the fields, then encoded with various character encodings, then hashed with the most famous hash functions. At the heart of this 4 for-loops imbrication, I finally tested if the mysterious ID was a substring of the obtained hash. This resulted in zero match. I figured that the author could have used a more complicated hashing process. He could even have used a custom salt somewhere in the process. This was hopeless.

The next logical step was probably either to contact the plugin's author or to try to decompile the plugin's dll. But this is when I stumbled upon this Hydrogen Audio thread where a method to link each song of the library to its ID was exposed. This method relies on another plugin, "Text Tools". This plugin enables the user to export text information about the songs in the library, accessing the songs' metadata. Foobar2000 exposes metadata in the form of variables surrounded by the % symbol. For example, %title% gives the song's title. Fair enough, the PS plugin exposes its own variables (play count, first played, etc) using this format. For example, %play_count% gives the song's play count. Wonderful! Using only Text Tools I was then able to export all relevant information about the library!

The Music Diary

On the Figure 1 (you can right-click on it and click to "Show Image" to see the full-size version), each point represents a tracked song from my library. A tracked song is a song that I listened to at least one time since I've installed the Playback Statistics plugin. The plugin increments the play count of a song when the player plays at least one straight minute of it. This leads to misleadings numbers. For example, one of the track I've listened the most to is The Fourth Movement of Beethoven's Fifth Symphony, with a Play Count (PC) of 102. However, this track is about 10 minutes long and I know that I've mostly listened only to the last 3 minutes of it. (What an epic finale!)

Figure 1

The X-axis represents the first time I've listened to a song. The Y-axis represents the Play Count of the song as of the writing of this article. It's not a surprise that the left of the graph, starting a little before June 2013, is flooded with many songs. That's all the songs I was casually listening to suddenly starting being tracked by the plugin. However, you can obverse the same phenomenon one year later in June 2014, which contains a high concentration of new songs too. That's because I like to explore new music during the summer holidays. The effect lose strength in the summer of 2015, which I assume is because I was doing an internship and didn't have as much time as I had during previous summers.

An interesting phenomenon takes place between September 2013 and February 2014. As you can see, I haven't discovered many songs in this period. However, the few I did I listened to them a lot of times! Well, this period was when I moved outside my parents' to study in a new city and I didn't have an Internet connexion in my new apartment. I remember exploring a bit of music at my school's computers and listening to them a lot in my apartment as I didn't have much to do.

I have a few ultra popular songs that I listen to massively. Breaking Through by Audiomachine holds the record with a stratospheric 153 PC. Songs with the highest PC tends to be older, having been listened for the first time around 2013. This makes sense, since the older a song is, the more time I have to listen to it repetitively and increment its PC. My favorite songs that I discovered recently are Take me To Church by Hozier and Take a Look Around by Limp Bizkit. The former is recent and I discovered it when it became popular worldwide. The latter dates back from 2000 and I just became fond of it when I discovered it last summer. I played it in loop while playing Urban Terror, which certainly contributed to its high PC.

You can see vertical ribs, for example at the beginning of last December or at the beginning of March 2015. That's when I discover an album. I listen to the whole album in one go. Then I listen to each track individually according to my preferences. Each track then evolves on the album's vertical line.

Finally, you can see how serious I am each year before my finals, during a little period around the end of April and the beginning of May, where I don't listen to any new song, except for Audiomachine's Millennium in May 2015. Indeed, I remember discovering it at the end of this movies montage during a procrastinating session.

The Fluff Principle

The Fluff Principle is a term invented by Paul Graham in a February 2009 article. The Hacker News creator gives the following definition:

on a user-voted news site, the links that are easiest to judge will take over unless you take specific measures to prevent it.

He explains that one type of fluff is short content. It's faster to consume a joke or a picture than an in-depth article. Therefore, at equal quality, the former will gain more upvotes per unit of time than the latter.

Well, the Fluff Principle also holds for my music statistics. It's faster to listen to a short song than to a long one. Therefore, at equal quality, short songs will gain more PC by unit of time than long ones. For example, my top one song, Audiomachine's Breaking Through (153 PC), is only 1:18 long. I know that I've listened to another song, Seven Angels by Avantasia a crazy amount of time, each time the whole thing. However, this song is 14:17, so it's not quite a surprise that its PC is only 31. Now if I want to know how much time I've been listening to each song, that is, the total amount of time I was listening while the song was playing, I have to multiple the PC by the song's duration. Breaking Through hits 153×(1×60+18) = 11,934 seconds. Seven Angels hits 37×(14×60+17) = 31,709 seconds. Wow! I've listened to Seven Angels way more than to Breaking Through!

Following this observation, we can define the Normalized Play Count (NPC) as the following: NPC(song) = PC(song)×Duration(song)/AD where AD is the Average Duration of a tracked song. This way, for songs which have a duration close to the average duration, their NPC will be approximately equal to their PC. The NPC plays an important role for particularly short songs, for which it diminishes their importance, and for long songs, for which it increases the importance.

The table on Figure 2 shows the top 20 songs in my library sorted by their NPC, in the reverse order. This table contains several anomalies that are due to the fact that the PS plugin increments the PC of a song only when 1 minute of it is played. Anomalies are very long tracks whose duration tends to give them a high NPC when in reality I listened only to a part of them. The anomalies are:

Ranked 1: It's 20 minutes long, but I know I almost never listen beyond the 10 minutes mark.

Ranked 2: It's 10 minutes long, but I almost always listen to its finale, that is the last 3-4 minutes.

Ranked 6: Final song of the album. I usually only listen to the first 3-4 minutes.

Ranked 9: I never listen to the Quasimodo Suite, which is quite lengthy.