Datomic MusicBrainz sample database

01 June 2013

MusicBrainz is an open music encyclopedia that collects music metadata and makes it available to the public. We are pleased to release a sample project that uses the MusicBrainz dataset to help people get familiar with using Datomic.

The MusicBrainz dataset makes a great example database for learning, evaluating, or testing Datomic for a couple of reasons:



It deals with a domain with which nearly everyone is familiar

It is of decent size: 60,438 labels; 664,226 artists; 1,035,592 album releases; and 13,233,625 recorded tracks

It comprises a good number of entities, attributes, and relationships

It is fun to play with, query, and explore

Schema

We omit any notion of Work

We combine Track, Tracklist and Recording into simply "track"

We renamed Release group to "abstractRelease"

Abstract Release vs. Release vs. Medium

Relationship Diagram

Entities

Getting Started

Getting the Data



# 2.8 GB, md5 4e7d254c77600e68e9dc71b1a2785c53

wget http://s3.amazonaws.com/mbrainz/datomic-mbrainz-backup-20130611.tar

# this takes a while

tar -xvf datomic-mbrainz-backup-20130611.tar

# takes a while, but prints progress -- ~150,000 segments in restore

bin/datomic restore-db file:datomic-mbrainz-backup-20130611 datomic:free://localhost:4334/mbrainz

Getting the Code

git clone git@github.com:Datomic/mbrainz-sample.git

cd mbrainz-sample

Running the examples

From Java



-Xmx2g -server



From Clojure

# from the root of the mbrainz-sample repo

lein repl

Thanks

The mbrainz-sample schema is an adaptation of a subset of the full MusicBrainz schema . We didn't include some entities, and we made some simplifying assumptions and combined some entities. In particular:(Adapted from the MusicBrainz schema docs An "abstractRelease" is an abstract "album" entity (e.g. "The Wall" by Pink Floyd). A "release" is something you can buy in your music store (e.g. the 1984 US vinyl release of "The Wall" by Columbia, as opposed to the 2000 US CD release by Capitol Records).Therefore, when you query for releases e.g. by name, you may see duplicate releases. To find just the "work of art" level album entity, query for abstractRelease.The media are the physical components comprising a release (disks, CDs, tapes, cartridges, piano rolls). One medium will have several tracks, and the total tracks across all media represent the track list of the release.For information about the individual entities and their attributes, please see the schema page in the wiki, or the EDN schema itself.First get Datomic , and start up a transactor.Next download the mbrainz backup and extract:Finally, restore the backup Clone the git repo somewhere convenient:Fire up your favorite IDE, and configure it to use both the included pom.xml and the following Java options when running:Start up a Clojure REPL:Then connect to the database and run the queries We would like to thank the MusicBrainz project for defining and compiling a great dataset, and for making it freely available.