One of the coolest features of Pinecast Studio is its ability to tie into our online services. You don’t need to use Pinecast, but the option is there. If we put out the best audio recording and editing software, we’d hope you consider using our online offerings as well.

As part of that, we wanted to include features like online recording backups. As podcasters ourselves, we’ve dealt with lost audio. We even have a central Box account that we store our recordings on. Despite that, people forget to upload their files, audio gets corrupted, and guests have a hard time sending us their recordings. Even more problematically, raw audio files are enormous, so we end up encoding them as MP3s, degrading their quality before they get edited. Everyone involved with Pinecast Studio was enthusiastic about being able to backup recorded audio right in the tool itself, since it eliminates a whole class of problems.

It turns out, though, that storing audio in the cloud is quite tricky.

When offering a cloud storage service, you want to first minimize cost. Keeping the service healthy and usable while protecting the user’s files is critical. Minimizing cost here, however, involves some interesting tradeoffs. First, let’s look at what our costs are:

Inbound transfer costs (cost of uploading the files)

Outbound transfer costs (cost of downloading the files again later)

Storage costs (cost of keeping the files saved)

We also want to have the highest-possible audio quality. Preserving the fidelity of the original recording by minimizing generation loss is crucial: imagine if every time you uploaded an image to Dropbox it got a bit blurrier. We must take that into consideration as well.

WAV provides a perfect representation of the original raw audio with no generation loss, so it’s got the quality bullet point covered. Compared to our raw audio data, WAV files are roughly half the size (the audio data is converted from 32-bit floating point audio samples to 16-bit integer audio samples). That’s not a huge win, though: just twenty seconds of raw mono audio is roughly four megabytes (!), meaning those same twenty seconds of audio saved as a WAV file are almost two megabytes. An hour-long podcast would take 350 megabytes per track. If three people are recording, that’s over a gigabyte an hour. No bueno.

But we can do better than WAV! WAV files are, for all intents and purposes, just raw audio with a header and a straight conversion of the audio samples. We can use a codec to encode the audio into a smaller file size. To figure out which codec to use, though, takes some experimentation.

After some asking around on the internet, I found three reasonable codec choices:

MP3

FLAC

Opus

The last of MP3’s patents expire this year, making all three “free” and preventing problems with licensing. MP3 is a lossy codec, meaning that it discards information to save space. Most of this information is audio frequencies that human ears cannot perceive, but it does degrade: think of how a JPEG becomes blurrier as the size decreases when the quality is lowered. MP3 is almost “transparent” (i.e., the encoded audio is indistinguishable from the raw source audio) at a bitrate of around 192kbps. It’s a very common format, so it has almost no compatibility problems.

FLAC (Free Lossless Audio Codec) is a free, lossless codec. That means it does not discard data to compress the audio. Instead, it compresses audio by representing it in a more efficient format than raw samples; there is no bitrate, just “the amount of data needed to write the audio as FLAC.” In other words, you can’t use FLAC to turn down the quality to make the file smaller. Because it is lossless, though, FLAC tends to produce much larger files than lossy codecs. FLAC is not as well-supported as MP3 and requires special desktop software to play, though it is supported in Chrome and Firefox.

Opus is very new. It’s made by the same folks that designed FLAC and Ogg Vorbis. Unlike FLAC, though, Opus is a lossy codec. It is notable, though, because it can encode speech very efficiently. Opus was designed with telephony applications in mind, and while Pinecast Studio isn’t dealing with telephony, it will deal almost exclusively with speech-only audio. Opus is said to be essentially transparent at 128kbps for music storage by the official Xiph wiki, though the Hyrdogenaudio wiki states that 32kbps will be transparent for speech (!).

We didn’t look at Ogg Vorbis for a few reasons. Notably, Opus supersedes it in terms of size and quality. Besides that, Vorbis is roughly comparable in size and quality to MP3. With better quality and sizes in Opus and better compatibility in MP3, there was little reason to look at Vorbis further.

In addition to these codec choices, it was suggested that I also test file sizes after applying a low-pass filter on the audio. A low-pass filter removes all audio frequencies above a certain threshold. In our case, we can fairly safely filter all frequencies that are far above the range of human speech. MP3 already filters away frequencies that you can’t hear anyway, but FLAC and Opus will both theoretically benefit from it. Technical details aside, less audio data to encode means smaller file sizes.

Collecting Data

The process of actually measuring the difference was fairly simple. I wrote a small Python script to invoke ffmpeg with a series of different settings.

import subprocess codecs = ['opus', 'flac', 'mp3']

bitrates = {

'opus': ['32k', '64k', '96k', '128k'],

'flac': ['1000k'], # we need something here; it's no-op

'mp3': ['192k', '245k', '320k'],

} for codec in codecs:

for bitrate in bitrates[codec]:

for lowpass in (True, False):

command = 'ffmpeg -i test.wav -y -c:a {codec} -b:a {bitrate} {lowpass_filter} {output_name}'.format(

codec=codec,

bitrate=bitrate,

lowpass_filter='-af lowpass=f=10000' if lowpass else '',

output_name='out_{codec}_{bitrate}_{lowpass}.{ext}'.format(

codec=codec,

bitrate=bitrate,

lowpass='lp' if lowpass else 'nolp',

ext=codec))

if not subprocess.call(command, shell=True)

print('failed: %s' % command)

It’s not the most beautiful code, but it works well enough. I ran it against a 22-second WAV file containing some speech I collected on my microphone. The various bitrates are thresholds that I’ve found from Googling about transparency, but they’re arbitrary at best (not that they need to be anything special).

Running the script produces the following files:

The results, sorted by size ascending. I’ve used a screenshot instead of console output because it’s easier to read on Medium.

“lp” means “low-pass,” while “nolp” means “no low-pass.” The “test.wav” file is the source audio file. The number in the middle of the filename is the bitrate of the file in kbps. FLAC’s bitrate is listed at 1000kbps, but you can ignore that.

Lots to unpack here! Some obvious results:

Opus is the overwhelming winner in terms of file size. Every single Opus file is smaller than every other MP3 and FLAC file.

The low-pass unsurprisingly does not affect MP3 (the file sizes are identical). For FLAC and Opus, though, it produces a notable effect. Note: the sizes in the screenshot are the size on-disk; the 32kbps Opus low-pass is almost 2kb smaller than the non-low-pass version.

The low-pass actually increases file size on Opus for higher bitrates. This is curious, but codecs are dark magic and you don’t ask questions about dark magic.

I spent a lot of time listening to the audio in VLC. 32kbps Opus had a few barely-noticeable artifacts in quieter parts of the audio — just enough for me to disqualify it. As expected, FLAC sounded exactly the same as the input. Even with the low-pass filter though, it was comparable in size to the 320kbps MP3 (which is complete overkill), so I’ll disqualify FLAC. With MP3 and Opus deriving no benefit from the low-pass filter, I removed those options as well.

I took the remaining files (64, 96, and 128kbps Opus and 192, 245, and 320kbps MP3) and dumped them into VLC with the source file. I set VLC to shuffle and repeat, and sat and tried to pick out artifacts, skipping ahead periodically (⌘+→ on macOS) and avoiding looking at my screen. This was frustrating: I simply could not hear any difference in audio quality.

This makes sense, though. All of these bitrates are above the reported levels that the codecs should be transparent at. Given the simple choice between 64 (or even 96kbps) Opus and 192kbps MP3, Opus is a no-brainer. Opus is almost a third of the size of MP3 with essentially no difference in quality for this use case. Even better, libopus has been compiled with emscripten to JavaScript and published on NPM.

I’d like to acknowledge a few shortcomings of my test:

The source audio file was very short. In reality, these audio files would be much longer, and the various codecs would potentially perform better or worse. That said, FLAC probably won’t give much better compression, and the disqualified bitrates produced unacceptable audio anyway. Longer audio files would have produced more opportunities for artifacts to present themselves. Audio at different “volumes” will compress differently. Quiet audio can compress better, in some cases, than loud audio. I should have tested with samples of me sitting at different distances from the microphone. I also should have tested with a compressor (note: the name is unrelated to file size compression) applied to the audio output, to see whether it improved file size compression. It would have been valuable to test with a wider array of audio types, like other voices (specifically female, which have a different range of frequencies). Testing with guitar or piano samples would also have provided interesting insights.

In a few months when the first public release of Pinecast Studio is available, your audio backups will be encoded and stored as Opus files. When the server-side APIs are up and running, I’ll write another post about what happens to your files after they’ve been shipped to the cloud.