MusicNet is a collection of 330 freely-licensed classical music recordings, together with over 1 million annotated labels indicating the precise time of each note in every recording, the instrument that plays each note, and the note's position in the metrical structure of the composition. The labels are acquired from musical scores aligned to recordings by dynamic time warping. The labels are verified by trained musicians; we estimate a labeling error rate of 4%. We offer the MusicNet labels to the machine learning and music communities as a resource for training models and a common benchmark for comparing results.

For detailed information, read our paper.

Why MusicNet

Music research has benefited recently from the effectiveness of machine learning methods on a wide range of problems from music recommendation (van den Oord et al., 2013; McFee & Lanckriet, 2011) to music generation (Driedger et al., 2015); see also the recent demos of Google's Magenta project. In the related fields of computer vision and speech processing, learned feature representations using deep end-to-end architectures have lead to tremendous progress in tasks such as image classification and speech recognition. These supervised architectures depend on large labeled datasets, for example ImageNet (Russakovsky et al., 2015). Inspired by the success of these methods, we have created MusicNet as the beginning of a project to explore these techniques in the realm of music.

Specifically, we propose the MusicNet labels as a tool to address the following tasks:

Identify the notes performed at specific times in a recording.

performed at specific times in a recording. Classify the instruments that perform in a recording.

that perform in a recording. Classify the composer of a recording.

of a recording. Identify precise onset times of the notes in a recording.

times of the notes in a recording. Predict the next note in a recording, conditioned on history.

More broadly, we hope that MusicNet can be a resource for more creative tasks. Automatic music transcription, inferring a musical score from a recording, is a long-standing open problem in the music information retrieval community. Music streaming services traditionally make recommendations based on collaborative filtering and metadata (e.g. artist and genre tags). Recently, some services have begun to incorporate audio features into their recommendation engines. Features learned from the MusicNet labels might be useful for recommendation. We are also interested in generative models that can fabricate performances under various constraints. Can we learn to synthesize a performance given a score? Can we generate a fugue in the style of Bach using a melody by Brahms?

We encourage the use of MusicNet for other creative music processing applications.

Distribution of music datasets is often constrained by copyright restrictions. The MusicNet labels apply exclusively to Creative Commons and Public Domain recordings, and as such we can distribute and re-distribute the MusicNet labels together with their corresponding recordings. The music that underlies MusicNet is sourced from the Isabella Stewart Gardner Museum, the European Archive, and Musopen. We thank the performers and staff at these institutions for their generous contributions. Without them, our work would not have been possible.

This work was supported by the Washington Research Foundation Fund for Innovation in Data-Intensive Discovery, and the program "Learning in Machines and Brains" (CIFAR).