Breaking Grooveshark's encryption

Written on 2015-11-30

As you may have heard, the streaming music service Grooveshark was taken offline last week after quite literally being sued to hell. I therefore believe it's now okay to publish the details of their encryption method.

Last year I already tried reading the files in an attempt to back up my Grooveshark library -- in hindsight a very wise undertaking. Looking in the files, a few things immediately stand out:

The files are not random.

There are many identical consecutive bytes.

Each encrypted file starts identically.

The first point means that the files are breakable. Of course they are breakable because I have an application that can decrypt them for me, but non-randomness means I can break them (at least in theory) without having to reverse engineer the application. The second point indicates a simple byte shift or mapping, no password involved. The third point indicates that it's equal for all files.

So far it looks good!

My first attempt was a simple rot13 and rot-{1..255} translation but this was unsuccessful. I then looked at the data being transferred to my phone as I added a song to my offline songs. (Technical details: Linux Deploy; ssh -X $phone wireshark .)

It turns out Grooveshark downloads (transfers) the plain mp3 files and only encrypts them locally. This was convenient because even if the encryption took too much time to break, I could just capture a bunch of transfers (downloads) and use those instead.

Comparing these plain mp3 files to the encrypted versions of them made it clear I almost certainly hit jackpot already:

Their lengths are identical. No extra header or padding (though that could still indicate CTR mode or a stream cipher). The bytes were a 1:1 mapping from encrypted to mp3, e.g. if the encrypted read abcabbba then the decrypted might read jeqjeeej . You probably see the similarity.

I wrote a simple PHP command line script that indexes the byte mapping for a file. This mapping was 00->37; 01->36; 02->39; ...; 12->41; etc. Using this mapping to decode another file worked! This effectively cracked their encryption already.

After cracking it I was excited to have succeeded and I was eager to share this with the world... but never got around to that too because of the possible legal implications, even with responsible disclosure. Thinking about how to go about this, I actually forgot about the original goal: backing up my music.

Grooveshark goes dark

Fast-forward to May 1st, I open the site and... they quit. They just quit. Following tips from reddit I copied my LocalStorage and immediately copied all Grooveshark files from my phone -- apk and everything.

At that moment I did not have access to my previous research on the topic and I could no longer stream fresh music to re-do it. I remembered how, a simple mapping, but which bytes map to which again? I couldn't look that up because I didn't have my files.

The files are mp3 files, which means they're compressed data with an entropy of about 7.8 bits per byte, including the header. This means I cannot do frequency analysis (analyzing the frequency of byte values only works on non-random looking data) and, alternatively, brute forcing all 256 possible mappings for all 256 possible byte values takes longer than I will live.

Each mp3 file starts with the same sequence so the first few bytes are known, and I could have found another few by figuring out where the title is in the id3 header (since the title of the song is known)... but that leaves like 220 byte mappings unknown, still too much to brute force.

In the end I did find a Python script that was supposed to do the trick, but it didn't. It seems to have been for iOS.

A week later I finally got access to my old files and from there on it was easy: grab the byte mapping and apply it to all files. But something still itched. There was a missing piece in the puzzle...

The real encryption algorithm

Looking at the mapping, the lower bytes have lower values, and they have this pattern... one down, two up... but that doesn't repeat... it's weird. Graphing it made it from an itch into a clear puzzle to solve:

There is something going on here other than the mapping. The mapping itself isn't even randomly chosen. I could have solved this without brute forcing... but how? What is the algorithm they use?

Displaying the mapping in hexadecimal revealed no insights. Perhaps I should look at bit level:

11111001 11111010 11111011 11111100 11111101 11111110 11111111 11011100 11011111 11011110 11011001 11011000 11011011 11011010

Looking at these pairs (read from top to bottom), it seems only the third, sixth and eight bit ever change. And they are always flipped, never the same. Hmm.

How about we do a bitwise not on 0b00100101? Wait, "bitwise not" does not exist. That is the famous... xor! They use xor encryption! And their key is 37, or percent in ASCII, just like the null bytes I saw in the encrypted file (they all turned into percent signs)! Dangit, that was really all too easy and I did not see the open door.

In summary

They just use xor "encryption" (if it's worth that term), mixing the file with a single byte (0x25, or ASCII 37, or a percent sign). All that's missing is a shell oneliner to put the final nail in the coffin. Bash doesn't really seem able to do bitwise xor in files, so we turn to the good old camel:

s/./print$&^"%"/egs for<>

That's it. With that algorithm you can decrypt Grooveshark's encryption. Example usage:

# Decode 12345.dat: perl -e 's/./print$&^"%"/egs for<>' < 12345.dat > 12345.mp3 # Do an entire folder multi-threaded: for i in *; do perl -e 's/./print$&^"%"/egs for<>' < $i > $i.mp3 & done

Or alternatively, this page decrypts the files for you: xor0x25.php

Update: Jeremy emailed me to let me know that on his Nokia, the symbol was not percent but a capital letter I (ASCII 73). iOS used yet another, but we don't know which one.

The mp3 files have ID3 tagging so they can easily be renamed and sorted into folders by artist and album. There are a lot of tools out there that do this automatically for you, but I never really used any so can't recommend one to you. (Typically I'd just apt-cache search , that usually gets me good stuff.)

On my phone (Cyanogenmod) the files could be found in:

# Offline and cache files: /media/0/Android/data/com.grooveshark.android.v1/ # The APK and random other files /app/com.grooveshark.android.v1.apk /data/com.grooveshark.android.v1/

Happy listening!

Happy listening?

I don't condone downloading music without compensating artists nor do I believe in ad-supported services. As we speak I have already reviewed Google Music, Deezer and finally decided to sign up for a paid subscription of Spotify. The major downside of Spotify is that I cannot upload my own music, even just for myself, but it seems alright otherwise. The price is twice that of what I paid Grooveshark, but with that they compensate artists (or the record labels... but if artists sign with labels then I can't help that).