If you've been following some of our recent Weeknotes posts you'll have read that we've been developing an audio waveform display in our Snippets Web application. This blog post goes into the detail of this work and describes how we prototyped a browser-based zoomable audio waveform view that allows users to interactively extract segments of the audio to download or share.

Snippets is an internal BBC Web application that allows staff to easily find archive TV and radio programmes, then extract and download segments of these programmes. Snippets currently has good support for browsing and navigating video content, and we’ve been working to provide similar capability for audio.

User research

To determine the specific goals for this feature, Rob, Joanne, and Andrew met with BBC radio editors to understand how they interact with audio using their existing software tools, and current Snippets users to gather their requirements.

From this we learned that a waveform display would be useful for visual navigation of the audio, and to enable users to quickly locate the start and end positions to clip within radio programmes. The display should be similar to that seen in desktop audio editing software, which BBC radio content editors and producers would be familiar with. The interface should allow users to zoom and scroll the waveform and the display should update in real time, synchronised with the audio playback. In terms of clipping audio segments, the interface should allow zooming to a level that allows the user to select starts of words in speech radio, but we did not need the ability to zoom to the level of individual audio samples. Also, although all the audio content we’re using is stereo, only a single-channel (rather than a two-channel) waveform display would be needed.

Developing the Snippets user interface for audio also presented a number of technical questions and challenges, such as:

How to produce a visual rendering of an audio waveform from the original audio files

How to access and process the waveform data (data format, serving mechanism, processing calculations, etc)

What display technology to use (e.g., canvas, SVG, WebGL, or server-rendered images)

Fallback options

Loading times

Display portions (how much waveform to show, and where)

Another major goal we had was to build the software as a reusable component that could be used in other Web applications besides Snippets.

Generating waveform data

Chris N decided to look at how existing applications that display audio waveforms do this. Audacity® is a popular audio editing program, and being open source software we were able to look at its source code and possibly use a similar approach for our own waveform display.

When Audacity opens an audio file it creates what internally it calls “summary information”. This is a downsampled version of the audio that allows Audacity to efficiently render the waveform on the screen. The Audacity architecture is described in the book The Architecture of Open Source Applications, which explains how the summary information is used:

"If Audacity is asked to display a four hour long recording on screen it is not acceptable for it to process the entire audio each time it redraws the screen. Instead it uses summary information which gives the maximum and minimum audio amplitude over ranges of time. When zoomed in, Audacity is drawing using actual samples. When zoomed out, Audacity is drawing using summary information."

The summary information is computed by finding the minimum and maximum sample amplitude values over groups of 256 input audio samples over the entire length of the audio file. To support zooming out to view long time durations (several hours), Audacity also computes summary information over groups of 65,536 input samples.

If the audio file is in a compressed format, e.g., MP3, creating the summary information involves decoding the audio, which can take a long time. But, having done this once and saved the summary information to disk, Audacity is subsequently able to open and display the audio waveform very quickly.

The audiowaveform program

To see if this approach would work in our application, Chris N wrote a command-line program, named audiowaveform that creates waveform data files in a binary format, using the algorithm described above, given either WAV or MP3 format audio as input.

Using these data files, the program can then render waveform images at arbitrary zoom levels using Audacity’s waveform rescaling algorithm. Use of these waveform data files allows us to perform the time-consuming processing of the original audio file only once, then render images at any zoom level very quickly.

Below is an example PNG image created using audiowaveform :

audiowaveform is written in C++, and uses libmad and madlld to read and decode MP3 files, libsndfile to read WAV files, and libgd to render PNG images.

Waveform data in the browser

As part of our exploratory research, we tested using JSON and the audiowaveform binary format for transfering data to the browser client from a Web server.

Whichever of these we used, the way of consuming the data should remain the same. Hence Thomas created an abstract JavaScript data layer, waveform-data.js, which provides JSON and binary data adapters, data accessor and helper methods, segment management, client-side resampling (using the Audacity algorithm). This module also handles pagination, rescaling of the waveform for zooming, and any time-related calculations which would otherwise clutter the UI code.

Support for binary data is achieved using the the Typed Array API. This allows us to iterate on portions of data without creating new in-memory objects, thanks to the DataView interface.

Comparing the performance of JSON against binary-encoded waveform data, we were interested to notice that the response time was dominated mainly by network transfer of the data, rather than client-side JSON parsing or handling of the binary data. The file size of the binary data is 2 or 3 times smaller than JSON, and using HTTP compression further reduces the amount of data transferred, by another factor of 2 or 3. So, using the binary format seemed a good way to go, for browsers that support the Typed Array API, and for other browsers we would fall back to JSON.

Waveforms web service

To deliver the audio waveform data to the Snippets Web application, Matt built a Sinatra-based Web service with a background worker process that uses audiowaveform to generate the data as new radio programmes appear in the Redux archive. The Web service uses Chris N's audio_waveform-ruby gem to convert the binary waveform data to JSON format.

Displaying the data

In building the user interface there were a number of questions to be answered that meant we had a very open scope for how to approach the problem. Here are a few of the considerations we had from an interface perspective:

Levels of zoom (in and out)

Zoom features (buttons, sliders, etc)

Zoom feedback (how to indicate current zoom level)

Time displays

Animations (scroll, playback)

Navigating the waveform

Display of playhead position

Chris F and Thomas dedicated a sprint to decide whether we would use HTML5 canvas, SVG or WebGL to display the audio waveforms. Our constraint was to cover a wide spectrum of browsers. We have not retained the WebGL option for this reason, as it requires at least Internet Explorer 11.

SVG was close to a one-line solution with the help of D3.js and waveform-data.js. Due to our approach we did not feel any performance hit; although displaying a full length programme data at the closest zoom level would be too much to draw without simplifying the SVG paths.

Despite this, we favoured Canvas as we felt it provided the right set of features and would form a good basis for the project in the future. We also felt Canvas would be more efficient at dealing with user interactions and synchronising them between the several views, especially overlapping segments and draggable offsets. Its ability to be updated using the browser requestAnimationFrame API makes it a clear winner for our purpose.

User interface component

The user interface component, Peaks.js, was developed by Chris F, with input from Thomas and Chris N. This was designed and built using the AMD module style. It could then be packaged either as a single class that would append itself to the window object, or as a require.js module that could be included in a pre-existing require.js setup as needed by the end user.

Grunt tasks were used to automate building the project. For development purposes application module files were created as independent require.js modules to ensure separation of concerns. At build time the grunt build task would first lint all script files and compile templates ready for building, then inspect the require.js dependency chain for all of the modules using r.js and concatenate and minify all the files in the correct order for packaging as a single module. The task would then prepend and append code fragments along with almond.js that would allow the end result file to be included as outlined above.

Using this structure and build process allowed us to code the application in a modular fashion but also provide the end users with the simplest ways of using the code in their own projects.

Working with canvas

Initially we worked with vanilla HTML5 canvas to render our waveforms, however we found that due to the fact we needed to have several active layers and event detection that simply using the vanilla implementation became very unwieldy. As a result we decided to use the KineticJS framework to abstract away a lot of the tricky parts of working with canvas and to give us inbuilt staging and layering of multiple canvases along with normalised event detection on Canvas elements. The use of KineticJS removed some of the disadvantages of working with canvas over SVG and allowed us to concentrate more on application logic rather than becoming mired in endless canvas context update loops.

Using KineticJS kept our code DRY by allowing us to define a base drawing function for plotting waveform coordinate values on the canvas context and then utilise that for the drawing of all different types of waveforms, segments and zoom levels included in Peaks.js.

Event driven architecture

The Peaks.js code uses a central events pubsub stack for inter-module communication. This meant that rather than direct inter-module function calls a module could publish an event to the event stack without worrying about who was listening to the event. Subsequently any other module in the application could subscribe to that event and then get notified when the event was fired. This allowed modules to only have to worry about themselves and reduced interdependency of the modules, giving good separation of concerns.

Conclusion

Peaks.js provides the bare-bones functionality for rendering, displaying and interactive with audio waveforms in the browser. A lot of effort was put in to ensuring that the application was concise and did not restrict the options available to the end user for customisation or use.

You can try using Peaks.js on the Peaks.js project homepage.

All the code described in this post is available as open-source software on our GitHub page:

audiowaveform — C++ program that generates waveform data files from MP3 or WAV format audio

audio_waveform-ruby — A Ruby gem that can read and write waveform data files

waveform-data.js — JavaScript library that provides access to precomputed waveform data files, or can generate waveform data using the Web Audio API

peaks.js — JavaScript UI component for displaying and interacting with waveforms

If you have questions, comments, or feedback on the project, please contact us.

Finally, we'd like to express our thanks to the Audacity team for their help, and allowing us to publish the code under the LGPL license.