The photos in the dataset include pieces from 67 designers participating in LFW. The number of photos per designer is not always the same, some designers have just one photo, whereas others have many.

Turning photos into numbers

You might be wondering what an autoencoder is…

An autoencoder is a special type of neural network that attempts to understand common characteristics of a dataset in order to represent it — or encode it — in an efficient manner.

Its purpose is to find a representation of a dataset in a reduced dimension. That’s done by training a symmetric neural network made of two parts: an encoder and a decoder.

We call it symmetric because the encoder and the decoder are very similar to each other, one is the reverse of the other.

The encoder reduces the information from each multidimensional input — in our case, the runway photos — to a limited set of dimensions.

The decoder expands from those limited dimensions to the data’s original size — it produces a version of the image which is reconstructed from the encoding.

The aim of the autoencoder is to perform this reduction and reconstruction whilst making sure that as much detail from the original image is preserved as possible.

Diagram of the fashion week autoencoder made up of an encoder and a decoder..

A regular autoencoder is a neural network which learns from the data to produce an encoding which represents a latent variable — in this case it attempts to represent inherent characteristics of the original photos.

But these encoded representations are hard to understand because they do not follow any kind of structure. So we used a kind of autoencoder called a variational autoencoder, which enforces these encodings to fit a normal distribution during training.

This is helpful because normal distributions are well understood and they follow a continuum. This representation better enables us to understand the limits of the encoded space and more easily manipulate it.

But fashion, being so visual, means we need to go one step further.

Because our data is composed of images, we use a convolutional neural network, which picks up higher order patterns in the images by applying two-dimensional operations.

The network learns not only from individual pixel intensity values, but also considers larger areas of the image, and can therefore extract insights from properties such as geometry and patterns, as well as color.

So, in short, a convolutional variational autoencoder is a kind of neural network that attempts to learn from higher order features of images and represent them in a set of normally distributed latent variables.

Entering ‘the encoded space’

Next, we trained an autoencoder to reduce the set of runway photos into a space made up of 16 dimensions. Using the encoder half of the autoencoder, each photo in the dataset can be represented in these dimensions; we called this representation the encoded space.

The ~3,000 runway photo dataset represented in the encoded space made up of 16 latent dimensions. Each column in this figure represents one single image, each row represents a single latent dimension and its value is represented in color.

Every single photo in the dataset is encoded into a set of 16 values — this representation of the encoded space describes the entire dataset in much less space.

Each of the 16 values represents a latent dimension, or an inherent characteristic of the photos. The autoencoder has learned from the variation in geometry, color, the model’s pose, and any other details from the photos in the dataset, and has condensed these features into 16 latent dimensions.

Because this is a variational autoencoder, we have forced these dimensions to be normally distributed — we can visualise those distributions via histograms.

Histograms of the latent dimensions in the encoded space. Each dimension roughly follows a normal distribution.

Our hope is that the features captured by these reduced dimensions map a visual aspect of the photos that we can recognise.

Additionally, by having each photo represented by these dimensions, we can do some interesting calculations, like finding the ‘average look’ of a designer’s collection or how a measure of how similar, or different, one design is from another.

Putting the photos back together again

The decoder half of the autoencoder, on the other hand, takes the 16 dimensions representing a single runway photo and reconstructs a version of the original photo.

For example, we can use the decoder to reconstruct a few images from the dataset: