The Deep Music Visualizer: Using sound to explore the latent space of BigGAN

A tool for AI artists, visual jockeys, synesthetes and psychonauts.

A deep musician

A deep music video

Want to make a deep music video? Wrap your mind around BigGAN. Developed at Google by Brock et al. (2018)¹, BigGAN is a recent chapter in a brief history of generative adversarial networks (GANs). GANs are AI models trained by two competing neural networks: a generator creates new images based on statistical patterns learned from a set of example images, and a discriminator tries to classify the images as real or fake. By training the generator to fool the discriminator, GANs learn to create realistic images.

BigGAN is considered Big because it contains over 300 million parameters trained on hundreds of google TPUs at the cost of an estimated $60,000. The result is an AI model that generates images from 1128 input parameters:

i) a 1000-unit class vector of weights {0 ≤ 1} that correspond to 1000 ImageNet classes, or object categories.

ii) a 128-unit noise vector of values {-2 ≤ 2} that control the visual features of objects in the output image, like color, size, position and orientation.

A class vector of zeros except a one in the vase class outputs a vase:

Interpolating between classes without changing the noise vector reveals shared features in the latent space, like faces:

Interpolating between random vectors reveals deeper sorts of structure:

If you’re intrigued, join the expedition of artists, computer scientists and cryptozoologists on this strange frontier. Apps like artbreeder have provided simple interfaces for creating AI artwork, and autonomous artificial artists loom while some users occupy themselves searching for the Mona Lisa.

Others have set BigGAN to music.

These “deep music videos” have garnered mixed reactions, varying between beautiful, trippy, and horrifying. To be fair, one is wise to fear what lurks in latent space…

What other unlikely chimeras, mythical creatures, priceless artworks and familiar dreams reside within BigGAN? To find out, we need to cover more ground. That’s why I built the deep music visualizer, an open source, easy-to-use tool for navigating the latent space with sound.

A latent spaceship, with bluetooth.

Take it for a spin and create some cool music videos along the way. Just make sure to share what you discover.