Special thanks to Eric Xie for fixing the MXnet cuDNN problem. MXnet can fully utilize cuDNN for speeding up neural art.

Have you clicked “star” and “fork” on MXnet github repo? If not yet, do it now! https://github.com/dmlc/mxnet

Update: want an almost real time Neural Art? Let’s go to MXNet-GAN model: it is a Generative Adversarial Network pretrained model, please refer to http://dmlc.ml/mxnet/2016/06/20/end-to-end-neural-style.html for details.

Neural art：paint your cat like Van Gogh

Neural art is a deep learning algorithm which can learn the style from famous artwork and apply to a new image. For example, given a cat picture and a Van Gogh artwork, we can paint the cat in Van Gogh style, like this (Van Gogh Self-portrait in 1889 wikipedia):

Neural art comes from this paper “A Neural Algorithm of Artistic Style” by Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge http://arxiv.org/abs/1508.06576. The basic idea is leveraging the power of Convolution Network (CNN) which can learn high level abstract features from the artwork and simulating the art style for generating a new image. These generative network is a popular research topic in deep learning. For example, you might know Google’s Inception which generates a goat-shape cloud image by given a goat image and a cloud image. Facebook also has their similar Deep Generative Image Models , inspired by this paper Generative Adversarial Networks where one of the authors, Bing Xu, is also the main author of the great MXnet.

Neural art algorithm has many implementations and applications, for example these two Lua/Torch7 ones LINK1 LINK2. The paper’s gitxiv also includes many interesting applications, for example, generating a neural art gif animation. All of these implemtations uses the VGG model of image classification as mentioned in the paper. With popular demand on github, MXnet has just published its fast and memory efficient implementation. Let’s have fun with MXnet!

Have you clicked the “star” and “fork” on MXnet github repo? If not yet, do it now! https://github.com/dmlc/mxnet

MXnet’s Neural art example

The neural art example is under mxnet/example/neural-style/ . Since this example needs much computing work, GPU is highly recommended, please refer to my previous blog “Deep learning for hackers with MXnet (1) GPU installation and MNIST” for detailed installation guide. For sure, one can use CPU anyway since mxnet supports seamless CPU/GPU switch, just a reminder, it may take about 40-50 minutes for generating an image with CPU.

Optional: MXnet can speed up with cuDNN! cuDNN v3 and v4 both work with mxnet, where v4 is 2-3 seconds faster than v3 on my GTX 960 4GB. Please go to https://developer.nvidia.com/cudnn and apply for nVidia Developer program. If approved, one can install CuDNN with CUDA as simple as this (Reference: Install Caffe on EC2 from scratch (Ubuntu, CUDA 7, cuDNN))) :

tar -zxf cudnn-7.0-linux-x64-v3.0-prod.tgz cd cuda sudo cp lib64/* /usr/local/cuda/lib64/ sudo cp include/cudnn.h /usr/local/cuda/include/

And please turn on USE_CUDNN = 0 in make/config.mk and re-compile MXnet for CuDNN support, if not previously compiled. Please also update the python installation if necessary.

For readers who don’t have an GPU-ready MXnet, the market has these free or paid services and apps for trying Neural Art. Since neural art needs a lot of computing, all these paid or free services need to upload the images to servers, and wait for a long time for finishing processing, usually from hours (if lucky) to weeks:

Deepart https://deepart.io/ Free submission and the average waiting time is a week 😦 if one wants a faster processing, one can consider donation and speed up to…. 24 hours wait. Pikazo App: http://www.pikazoapp.com/ It is a paid app (2.99$) as deepart.io, and you still need to wait in line for a long time. AI Painter: https://www.instapainting.com/ai-painter It is a service from instapainting, free of charge, long waiting time too. DeepForger: https://twitter.com/DeepForger (Thanks to alexjc from reddit) a twitter account that people can submit images and get Neural Art results within hours. “It’s a new algorithm based on StyleNet that does context-sensitive style, and the implementation scales to HD.”

For my dear readers who are lucky to have a GPU-ready MXnet, let do it with MXnet! I am going to use an image from my sister’s cat “pogo” and show every single detail of generating an art image, from end to end.

Steps and parameters

MXnet needs a VGG model. We need to download it for the first time running using download.sh . MXnet version of this VGG model takes about several MB where the Lua version of the same model costs about 1 GB. After having the model ready, let’s put the style image and the content image into the input folder. For example, I give mxnet the cat image as content, and Van Gogh’s painting as style:

python run.py --content-image input/pogo.jpg --style-image input/vangogh.jpg

After 1-2 minutes, we can see the output in the output folder like this:

Let’s try painting the cat “Pogo” in a modern art style. By replacing Van Gogh with ‘Blue Horse’ Modern Equine Art Contemporary Horse Daily Oil Painting by Texas Artist Laurie Pace (https://www.pinterest.com/pin/407223991276827181/) , pogo is painted like this:

python run.py --content-image input/pogo.jpg --style-image input/blue_horse.jpg

Isn’t it cool?

In the python script run.py , there are some fine tune parameters for better results, and each of them is explained as following:

--model The model name. In this example, we only have VGG model from image classification, so please let it as it is. In future, MXnet may provide multiple other models, like Google Inception, since they share the same framework.

The model name. In this example, we only have VGG model from image classification, so please let it as it is. In future, MXnet may provide multiple other models, like Google Inception, since they share the same framework. --content-image Path to the content image, a.k.a the cat image.

Path to the content image, a.k.a the cat image. --style-image Path to the style image, a.k.a the Van Gogh image.

Path to the style image, a.k.a the Van Gogh image. --stop-eps The model use eps for evaluating the difference between the content and the style. One can see the eps value converging during the training. The less eps , the more similar style. stop-eps is the threshold for stopping the training. Usually a smaller stop-eps gives stronger style, but it needs longer training time. The default 0.005 value is good, and one can change to 0.004 for better results.

The model use for evaluating the difference between the content and the style. One can see the value converging during the training. The less , the more similar style. is the threshold for stopping the training. Usually a smaller gives stronger style, but it needs longer training time. The default 0.005 value is good, and one can change to 0.004 for better results. --content-weight --style-weight The weight of content and style. By default, it is 10:1. If one thinks the style is too strong, for example, the painting feels strange and harsh, please reduce it to 20:1 or 30:1.

The weight of content and style. By default, it is 10:1. If one thinks the style is too strong, for example, the painting feels strange and harsh, please reduce it to 20:1 or 30:1. --max-num-epochs The max number of epochs, by default it is 1000 epochs. Usually MXnet can converge to a good eps value around 200 epochs, and we can leave this parameter alone.

The max number of epochs, by default it is 1000 epochs. Usually MXnet can converge to a good value around 200 epochs, and we can leave this parameter alone. --max-long-edge The max length of the longer edge. MXnet adjust the content image to this size and keeps the aspect ratio. The runtime is almost proportional to the number of pixels (aread) of the image, because the convolution network input size is defined by the number of pixels, and each convolution is on each image block. In short, 700px image may double the memory cost and runtime to that in 500px image. In the following benchmark, one can see that, a 512 px image needs about 1.4GB memory, which is good for a 2014 Macbook Pro or other 2GB CUDA devices; a 850-900 px image is good for 4GB memory CUDA card; if one wants a 1080p HD image, one may need to get a 12GB memory Titan X. Meanwhile, the computing time is related to the number of CUDA cores: the more cores, the faster. I think my dear readers now understand why these free Neural Art services/apps needs hours to weeks of waiting time.

The max length of the longer edge. MXnet adjust the content image to this size and keeps the aspect ratio. The runtime is almost proportional to the number of pixels (aread) of the image, because the convolution network input size is defined by the number of pixels, and each convolution is on each image block. In short, 700px image may double the memory cost and runtime to that in 500px image. In the following benchmark, one can see that, a 512 px image needs about 1.4GB memory, which is good for a 2014 Macbook Pro or other 2GB CUDA devices; a 850-900 px image is good for 4GB memory CUDA card; if one wants a 1080p HD image, one may need to get a 12GB memory Titan X. Meanwhile, the computing time is related to the number of CUDA cores: the more cores, the faster. I think my dear readers now understand why these free Neural Art services/apps needs hours to weeks of waiting time. --lr logistic regression learning ratio eta for SGD. Mxnet uses SGD for finding a image which has similar content to cat and similar style to Van Gogh. As in other machine learning projects, larger eta converges faster, but it jumps around the minimum. The default value is 0.1, and 0.2 or 0.3 work too.

logistic regression learning ratio for SGD. Mxnet uses SGD for finding a image which has similar content to cat and similar style to Van Gogh. As in other machine learning projects, larger converges faster, but it jumps around the minimum. The default value is 0.1, and 0.2 or 0.3 work too. --gpu GPU ID. By default it is 0 for using the first GPU. For people who have multiple GPUs, please specify which ones would be used. --gpu -1 means using CPU-only mxnet, which takes 40-50 minutes per image.

GPU ID. By default it is 0 for using the first GPU. For people who have multiple GPUs, please specify which ones would be used. means using CPU-only mxnet, which takes 40-50 minutes per image. --output Filename and path for the output.

Filename and path for the output. --save-epochs If save the tempory results. By default, it saves output for each 50 epochs.

If save the tempory results. By default, it saves output for each 50 epochs. -remove-noise Gaussian radius for reducing image noise. Neural art starts with white noise images for converging to the neural art from content + style, so it artificially introduces some unnecessary noise. Mxnet can simply smooth this noise. The default value is 0.2, and one can change it to 0.15 for less blur.

Troubleshooting

Out of memory

Since the runtime memory cost is proportional to the size of the image. If --max-long-edge was set too large, MXnet may give this out of memory error:

terminate called after throwing an instance of 'dmlc::Error' what(): [18:23:33] src/engine/./threaded_engine.h:295: [18:23:33] src/storage/./gpu_device_storage.h:39: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: out of memory

To solve it, one needs to have smaller --max-long-edge : for 512px image, MXnet needs 1.4GB memory; for 850px image, MXnets needs 3.7GB. Please notices these two items:

GTX 970 memory issue: GTX 970 can only support up to 3.5GB memory, otherwise it goes crazy. It is an known problem from nVidia, please refer to this link for more details. The system GUI costs some memory too. In ubuntu, one can press ctrl+alt+f1 for shutting down the system graphic interface, and save some about 40MB memory.

Out of workspace

If the image size is larger than 600 to 700 pixels, the default workspace parameter in model_vgg19.py may not be enough, and MXnet may give this error:

terminate called after throwing an instance of 'dmlc::Error' what(): [18:22:39] src/engine/./threaded_engine.h:295: [18:22:39] src/operator/./convolution-inl.h:256: Check failed: (param_.workspace) >= (required_size) Minimum workspace size: 1386112000 Bytes Given: 1073741824 Bytes

The reason is MXnet needs a buffer space which is defined in model_vgg19.py as workspace for each CNN layer. Please replace all workspace=1024 in model_vgg19.py with workspace=2048 .

Benchmark

In this benchmark, we choose this Lua (Torch 7) implementation https://github.com/jcjohnson/neural-style and compare it with MXnet for learning Van Gogh style and painting pogo the cat. The hardware include a single GTX 960 4GB, a 4-core AMD CPU and 16GB memory.

512px

Memory Runtime MXnet (w/o cuDNN) 1440MB 117s MXnet (w/ cuDNN) 1209MB 89s Lua Torch 7 2809MB 225s

MXnet has efficient memory usage, and it costs only half of the memory as that in the Lua/Torch7 version.

￼

850px

Lua/Torch 7 is not able to run with 850px image because of no enough memory, while MXnet costs 3.7GB memory and finishes in 350 seconds.

Memory Runtime MXnet (w/o cuDNN) 3670MB 350s MXnet (w/ cuDNN) 2986MB 320s Lua Torch 7 Out of memory Out of memory

MXnet magic to squeeze memory (12.21.2015 update)

With some invaluable discussion from reddit, and special thanks to alexjc (the author of DeepForger) and jcjohnss (the author of Lua Neural-artstyle), I have this updated benchmark with MXnet’s new magic MXNET_BACKWARD_DO_MIRROR to squeeze memory (github issue). Please update to the latest MXnet github and re-compile. To add this magic, one can simply do:

MXNET_BACKWARD_DO_MIRROR=1 python run.py --content-image input/pogo.jpg --style-image input/vangogh.jpg

512px

Memory Runtime MXnet (w/o cuDNN) 1440MB 117s MXnet (w/ cuDNN) 1209MB 89s MXnet (w/o cuDNN + Mirror) 1116MB 92s

850px

Memory Runtime MXnet (w/o cuDNN) 3670MB 350s MXnet (w/ cuDNN) 2986MB 320s MXnet (w/ cuDNN+Mirror) 2727MB 332s

The mirror magic slows down a little bit and gains memory saving. With this Mirror magic, a 4GB GPU can process up to 1024px image with 3855MB memory!

Some comments about improving the memory efficiency: currently in the market, the Lasagne version (with Theano) is the most memory efficient Neural Art generator (github link, thanks to alexjc) which can process 1440px images with a 4GB GPU. antinucleon, the author of MXnet, has mentioned that, gram matrix uses imperative mode while symbolic mode should save more memory by reusing it. I will update the benchmark when the symbolic version is available.

In short, MXnet can save more memory than that in the Lua version, and has some speed up with CuDNN. Considering the price difference between a Titan X (1000$) and a GTX 960 4GB (220$), MXnet is also eco-friendly.

A note about the speed comparision: Lua version uses L-BFGS for the optimal parameter search while MXnet uses SGD , which is faster but needs a little bit tune-ups for best results. To be honest, the comparision above doesn’t mean MXnet is always 2x faster.

￼

For readers who want to know MXnet’s secret of efficient memory usage, please refer to MXnet’s design document where all dark magic happens. The link is http://mxnt.ml/en/latest/#open-source-design-notes

Till now, my dear readers can play with Neural art in MXnet. Please share your creative artwork on twitter or instagram with #mxnet and I will check out your great art!

How machine learns the artwork style?

Quantize the “style”

“Style” itself doesn’t have a clear definition, it might be “pattern” or “texture” or “method of painting” or something else. People believe it can be described by some higher order statistical variables. However, different art styles have different representations, and for a general approach of “learning the style”, it becomes very difficulty to extract these higher order variables and apply to some new images.

Fortunately, Convolution Network (CNN) has proved its power of extracting high-level abstract features in the image classification, for example, computers can tell if a cat is in the image by using CNN. For more details, please refer to Yann Lecun’s deep learning tutorial. The power of “extracting high-level abstract features” is used in Neural Art: after couple of layers of convolution operations, the image has lost its pixel-level feature, and only keeps its high-level style. In the following figure from the paper, the author has defined a 5-layer CNN, where the staring night by Van Gogh keeps some content details in the 1st, 2nd and 3rd layer, and becomes “something looks like staring night” in the 4th and 5th layer:



￼

And the author has reached the “Aha!” moment: if we put a Van Gogh image and one more other image to the same CNN network, some clever adjustment may make the second image closer to Van Gogh, but keeps some content in the first 3 layers. It is the way to simulate Van Gogh painting style! Moreover, there is a VGG model for image classification in the market for it!

Learn style and generate a new image

Now the problem becomes an optimization problem: I want the generated picture looks like my cat (the content feature should be kept for the first 3 layers), and I want Van Gogh style (the style feature for the 4th and 5th layer), thus the solution is an intermediate result which has a similar content representation to the cat, and a similar style representation to Van Gogh. In the paper, the author uses a white noise image for generating a new image closer to the content using SGD, and the other white nose image for being closer to the style. The author has defined a magical gram matrix for describing the texture and has used this matrix to defind the loss function which is a weighted mixture of these two white noise image. Mxnet uses SGD for converging it into a image which meets both of the content and style requirement.

For exmaple, in these 200+ steps of painting pogo the cat, the generated image changes like this:

where we can see, in the first 50 epoches, the generated image looks like a simple texture overlap in between the content and the style; with more epoches, the program gradually learns the color, the pattern etc, and becomes stable around 150th epoches, and finally paints pogo the cat in Van Gogh style.

Further reading: other methods of simulating art style

Neural art is not the only method of simulating artwork style and generating new images. There are many other computer vision and graph research papers, for example:

“A Parametric Texture Model Based on Joint Statistics of Complex Wavelet Coefficient” http://www.cns.nyu.edu/pub/lcv/portilla99-reprint.pdf which uses wavelet transforming for the higher order texture feature.

“Style Transfer for Headshot Portraits” https://people.csail.mit.edu/yichangshih/portrait_web/ This work is specific for headshot portrait, which is a constraint problem and the method is much faster than Neural art.

Summary

Neural art is a nice demo for convolution network, and people can generate artwork from their own images. Let’s have fun with MXnet neural art. Please share your creative artwork on twitter or instagram and add hashtag #mxnet .

A reminder about the style: if the content image is a portrait, please find a portrait artwork for learning the style instead of a landscape one. It is the same with landscape images, always landscape to landscape. Because the landscape artwork uses different paiting techniques and it doesn’t look good on portrait images.

In the next blog, I will have detailed introduction to the convolution network for image classification, a.k.a, the dog vs the cat.