For lovers of psychedelic art, the technique of Deep Dreaming is certainly fascinating. When applied to images, it gives the impression of an Instagram filter that’s currently attending Woodstock. Since its inception, a plethora of breathtaking artworks has been generated with this method. While this may already be an old hat for some people, we are are still far from scraping the bottom of this barrel, considering the endless possibilities of what can be dreamt into the innocuous input.

Well, technically they may not be endless. Taking image classification architectures trained on the ImageNet dataset as an example, the content available for dreaming is comprised of the 1000 object classes, which nevertheless is pretty impressive. In principle, for all of these classes, features can appear during the dreaming process that mirror the internal representations which the network has learned about these objects. Not all of these are meaningful to humans, but we have all seen the creepily realistic eyes and dog faces that have been generated with this method. As a Deep Dream practitioner, you can choose at which layer(s) you apply a gradient ascent evolution in order to maximize activations. Generally, with early layers you will see low-level features such as edges and basic shapes. Going deeper results in more high-level features and recognizable objects.

Unfortunately, apart from the choice of layer, using this approach we don’t have much control over what the network will recognize and amplify in the input. However, we can actually specify precisely what we wish to see if we move to the very last layer, where the activations correspond directly to the object classes and we can select which of these we want to maximize. If you wish, this approach is an artistic adversarial attack.

In this post, we will use this idea to explore some of the ImageNet classes in more depth using the pre-trained VGG-19 architecture. To facilitate single-class dreaming, we perform a full forward pass through the whole network using some input image. The output is then a vector containing the probabilities corresponding to each of the classes. In order to pick out and optimize only on one or a few of the classes, we generate a target vector with ones for the labels we wish to see, and zeros otherwise. If you want to be fancy, you can also mix classes using different weights, putting more emphasis on certain objects compared to others. A comprehensive list of all ImageNet classes can be found here. Using the target vector we can calculate the loss in terms of the standard cross entropy with the network output and perform a gradient descent step on the input. We only have to be careful about the sign here, since dreaming is usually done with gradient ascent.

To give an explicit example, suppose we want VGG-19 to dream about ostriches (the corresponding class label is 9):

criterion = torch.nn.BCEWithLogitsLoss()

label = 9

output = model(input_image) target = torch.zeros(output.size())

target[0, label] = 100

loss = -criterion(output, target)

Note that we use 100 instead of 1 for the target class. This amplifies the gradient and we found it to give better results than adjusting the learning rate. After the backward pass, we simply add the gradient of each pixel to itself in the input image. Rinse, repeat, and we get some really fascinating yet specific dreams! Let’s look at some of our most interesting results. The input we use for all examples is the following image, which was taken from here: