The content loss function

The content loss is a function that takes as input the feature maps at a layer in a network and returns the weighted content distance between this image and the content image. This function is implemented as a torch module with a constructor that takes the weight and the target content as parameters.

The mean square error between the two sets of feature maps can be computed using a criterion nn.MSELoss and forms the third parameter. Content losses are added at each desired layer as additive modules of the neural network. This way, each time the network is fed with an input image, all the content losses will be computed at the desired layers. autograd handles the computation of all gradients. For this we make the forward of the module return the input.

The module becomes a transparent layer of the neural network and the computed loss is computed as a parameter of the module. We then define a fake backward method that calls the backward method of nn.MSELoss in order to reconstruct the gradient. This method returns the computed loss, which will be used when running the gradient descent in order to display the evolution of style and content losses.

Style Loss

For the style loss, we define a module that computes the gram produced, given the feature maps of the neural networks. We then normalize the values of the gram matrix by dividing by the number of elements in each feature map.

The style loss module is implemented exactly the same way as the content loss module; however, it compares the difference in gram matrices of target and input.

Loading the neural network

Similar to what is described in the paper, we use a pre-trained VGG network with 19 layers (VGG19). The module in PyTorch that allows us to do this is divided into two child sequential layers; the features that contain convolution and pooling layers and a classifier that has the fully connected layers.

VGG networks are trained on images with each channel normalized by mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225]. We use them to normalize the image before sending it to the network.

We would like to add our style and content modules as additive transparent layers at desired layers in our network. In order to achieve this, we construct a new Sequential module in which we add modules from vgg19 and the loss modules.

For this tutorial, we’ll use our content image as our input image. You can use a different image, but it has to have the same dimension as the other images.

The author suggests that we use the L-BFGS algorithm to run our gradient descent. We train the input image in order to minimize the content/style losses. We create a PyTorch L-BFGS optimizer optim.LBFGS and pass the image as the tensor to optimize. We use .requires_grad_() to ensure that the image requires gradient.

We must feed the network with the updated input in order to compute the new losses at each step. We run the backward methods of each loss to dynamically compute their gradients and perform gradient descent. The optimizer requires as argument a closure, which is a function that re-evaluates the model and returns the loss.

A small challenge that arises when doing this is that the optimized image may take values between −∞ and +∞ instead of between 0 and 1 as required. We therefore must perform an optimization under constraints to ensure that we maintain the right values in our input image. We achieve this by correcting the image so its values fall between the 0–1 interval at each step.

Conclusion

Now let’s proceed to see our newly-generated image that has the artistic style of the style image.

You can use this very same code with different images to try out new artistic designs. However, keep in mind that the neural-style algorithm requires that the image be of the same dimension.

Reference

Advanced Neural Transfer Tutorial