In the outputs from earlier layers (conv1_2 and conv2_2), the patterns in the style image which require smaller receptive field (like the coffee colored background) are prominent. In the later layers (conv3_3 and conv4_1) bigger patterns are more prominent. The output of conv5_1 looks more like garbage. I think this is because very few activations are active at that layer. So it doesn’t contribute much to the loss. And the same is shown in the initial losses below.

For generating the stylized images, I used conv3_3 for calculating the content loss.

When I kept the weights of all the 5 layers equal to 1, the following were the layer-wise content and style loss values with image initialized to ‘content’:

layer-wise style loss values for initial image ‘content’

When I kept the weights of all the 5 layers equal, the following were the layer-wise content and style loss values with image initialized to ‘noise’:

layer-wise style loss values for initial image ‘noise’

As is evident, the contribution of later layers to style loss is very small in both the cases. And in most published work on style transfer, the weight is given equal to all layers which doesn’t make sense.

Next, I wanted to see what kind of stylized images will be generated if I train using only one of these layers.