Since I often work at my local Hackerspace alone, I can play whatever I want on the space’s music system, so I’ve taken up the habit of keeping up with a bunch of radio shows which I download from SoundCloud.

The usual process of playing something on our music server is a bit cumbersome: I have to browse SoundCloud, download a track that seems interesting, upload it to the music server, let MPD index it, and then play it.

To make my life easier, I wanted to leverage Linux’s FUSE interface to expose the audio on SoundCloud as a bunch of files in a folder which MPD could then index and play from. I could have taken a couple of other approaches such as swapping MPD with Mopidy, an MPD-compatible music player with SoundCloud support, or a scraper that periodically downloads the latest audio from my feed. And while these alternatives were probably easier to implement, I went for the FUSE driver because I thought it was cool.

Setting up

Getting access to SoundCloud’s API was actually a bit inconvenient. Client apps need a token in order to connect, and to get a token, you need to register your app with… A Google form… which has been taken offline. But it’s alright, SoundCloud’s webpage uses their API, so I’ll use the token used for that instead.

I chose Rust as programming language for this project because it is fast, pleasant to work with, somewhat low level and had 3rd party bindings for FUSE. Plus, I also maintain a Rust library, id3 , for reading and writing MP3 metadata which I thought could come in handy.

Serving the audio

It is possible to access the audio on SoundCloud via an undocumented endpoint which serves an 128Kb/s CBR MP3 file.

To be able to access the contents of a file, the FUSE interface requires the read function to be implemented. This function takes an offset and a length and should return a chunk of the file being read by copying the data to a buffer.

There’s a mismatch in this interface here: FUSE allows the file’s contents to be accessed at arbitrary offsets, which is not possible with content streamed over HTTP. But HTTP offers solution for seeking to arbitrary offsets.

Seeking with HTTP Ranges

HTTP supports seeking through remote files by using the Range scheme. When supported by the server, a client may request the file from the server starting at an offset by setting the Range header to the desired offset.

SoundCloud supports Range requests on their audio so web browsers don’t have to download the whole audio file when e.g. a user skips to the end of a long mix. So I could use this to implement quick seeking in the files I serve over FUSE.

With the file IO interface implemented, I could now serve audio over FUSE!

MPD Indexing

But this project is not over yet. To be able to add these files to my library in a way that allows easy searching, we need to add some metadata to the audio files.

ID3v2

MP3 files have a specific data format for storing metadata: ID3v2. This tag is prepended to audio and contains information like the artist, track title, album and release year.

If you are the kind of person that likes to keep a library of your music, you must know the pain of manually managing all this metadata. Luckily, I can just use the metadata from the REST API, generate a tag on the fly and prepend it to the file served.

Sweet, now MPD is able to index the file. But for every time we open a file, we perform a HTTP request to get the audio which can cause the thousands of files that I expose to index quite slowly.

Let’s make it faster!

Lazy Loading

The main thing that I have to fix is the HTTP request every time a files is opened for reading. I decided to go for a lazy loading scheme by implementing a new file IO layer that pretends to be an opened file but actually starts doing HTTP requests when the first read is performed. This was very easy to do with Rust’s io::Read and io::Seek interfaces.

We do have to know the size of the file so seeking to the end keeps working. Calculating this is possible as well. We know the track’s duration from SoundCloud’s REST API and also that all files use the same 128Kb/s constant bitrate MP3 encoding. This enables us to calculate the size of the MP3 file like this:

mp3_size_in_bytes = duration * 128000 / 8

I wrote a small program to dump the ID3v2 data of a file so I could test this, but using this program still caused the lazy loading to trigger…

ID3v2 Padding

Reading files is done in chunks. So the thing that was triggering my lazy load was actually the first read call for the ID3v2 tag that used a part of the MP3 data to fill the read buffer. Refer to the diagram below and notice how the chunk read labeled Read 1 overlaps both the ID3v2 tag and a part of the audio stream:

MP3 files allow for padding to exist between the ID3v2 tag and the MP3 stream. The reason for this padding is to allow the tag to be modified and extended without requiring the MP3 stream to be moved. I can leverage this padding to let it fill up the read buffer so the audio is not accessed:

And it works! But only with my ID3v2 dumping program. MPD still triggered HTTP requests when indexing. Why?

MP3 Stream Hacking

Ah, of course! MPD wants to know the length of the audio. This meant that MPD would always skip past my padding into the MP3 stream. Fixing this would not be easy.

The MP3 Info frame

Let’s figure out how MPD calculates the stream length. Documentation about the MP3 format turned out to be hard to find, especially the metadata part that I was after. So instead, I started digging around in the source code of existing MP3 decoders. With what little information I could find online and the clues from other source code, I was able to piece together the information about the metadata format of the MP3 stream.

MP3 streams consist of frames, each encoding a number of samples for the audio signal. Frames can be of variable length to enhance the compression ratio. Because of this, seeking to some time in the stream is not a matter of jumping to byte found at something like the (size/duration)*offset th byte.

To make seeking possible, a special Info or Xing frame precedes the real stream. This frame contains the number of frames in the file, the file size in bytes and a table of offsets which can be used for quick seeking. This frame is what players use to calculate the duration of a file.

Rewriting the stream

So to to ensure that players would be able to probe the stream duration of a track, I would have to replicate the Info frame like how I already added the ID3v2 tag.

The Info frame has a fixed size of 417 bytes. So cutting it out is just a matter of setting the HTTP Range header to this offset.

Some fields in the Info frame are optional. And with some experimentation, I found out that I only needed to set the frame count and file size for players to calculate the duration. The seek table, or Table of Contents as it is rather called, was left blank as it was redundant anyway because SoundCloud serves files with a constant bitrate.

More padding hacking

To make lazy loading really work, we have to adjust the padding hack from earlier a bit. Instead of having padding between the ID3v2 tag and the stream start, the padding should now be between the Info frame and the first real audio frame. But because the padding has to be inserted in the MP3 stream itself, using just zero bytes would not be sufficient.

So I generate dummy frames containing no real audio instead as padding. This adds a few seconds of silence to every track served, but considering my use case of playing long mixes this is a fair compromise.

With all these hacks, the full file header now looks like this:

Another thing that should be noted is that players also often look for an ID3v1 tag which is a trailer of 128 bytes. So I also needed to generate some empty frames at the end of the file to ensure that players seeking to this tag would not trip the lazy loading.

Conclusion

And with that, I was able to make indexing with MPD work smooth and quickly. This certainly was not the easiest way to achieve my goal, but I certainly had a lot of fun and learned a lot.

As a triumph, here’s a screenshot of NCMPCPP listing some episodes of Above & Beyond that have been indexed by MPD:

For reference, here are some stats about the files exposed by FUSE for my account:

$ find -L -type f | wc -l 14328 $ du -hsL 304G .

And a benchmark of scanning the metadata of a single file:

$ time ffprobe -hide_banner sc-test/polyfloyd/favorites/rave_on_-_orkidea-live-fokused-scandinavian-festival-stockholm-29032003.mp3 Input #0, mp3, from 'sc-test/polyfloyd/favorites/rave_on_-_orkidea-live-fokused-scandinavian-festival-stockholm-29032003.mp3': Metadata: artist : rave_on title : Orkidea - Live @ Fokused Scandinavian Festival, Stockholm 29.03.2003 TLEN : 3385733 copyright : cc-by-nc-sa genre : Classic Trance date : 2016 Duration: 00:56:47.10, start: 0.000000, bitrate: 127 kb/s Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 128 kb/s real 0m0.075s user 0m0.054s sys 0m0.009s

References