Originaly posted on FB Deep Dream // Tutorials group under this link: https://www.facebook.com/groups/733099406836193/permalink/748784328601034/ A little bit messy. Give me an info if something is unclear or bullshit.

Ok, so, you’re bored, have spare time, have working caffe on gpu and want to try train network to get rid of dogs in deep dream images… Here is tutorial for you. In points.

Forget about training from the scratch, only fine tune on googlenet. Building net from the scratch requires time, a lot of time, hundreds of hours… Read it first: http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html. The hardest part: download 200-1000 images you want to use for training. I found that one type of images work well. Faces, porn, letters, animals, guns, etc. Resize all images into dimension of 256×256. Save it as truecolor jpgs (not grayscale, even if they are grayscale) OPTION: Calculate average color values of all your images. You need to know what is average value of red, green and blue of your set. I use command line tools convert and identify from ImageMagick: convert *.jpg -average res.png; identify -verbose res.png to see ‘mean’ for every channel. Create folder <caffe_path>/models/MYNET <- this will be your working folder. All folders and files you’ll create will be placed in MYNET Create folder named ‘images’ (in your working folder, MYNET) For every image you have create separate folder in ‘images’. I use numbers starting from 0. For example ‘images/0/firstimage.jpg’, ‘images/1/secondimage.jpg’, etc… Every folder is a category. So you end up with several folders with single image inside. Create text file called train.txt (and put it to the working folder). Every line of this file should be relative path of the image with the number of the image category. It looks like this:

images/0/firstimage.jpg 0

images/1/secondimage.jpg 1

… Copy train.txt into val.txt file Copy deploy.prototxt, train_val.prototxt and solver.prototxt into working folder from this link: https://gist.github.com/tsulej/ff2b3e37aa76e8fbb244 Next you need to edit all files, let’s start. train_val.prototxt: lines 13-15 (and 34-36) define mean values for your image set. For blue, green and red channels respectively (mind the reverse order). If you don’t know them just set all to 129

line 19, define number of images processed at once. 40 works well on 4gb GPU. You need probably change it to 20 for 2gb gpu. You’ll get out of memory error if number is too high, then just lower it. You can set it to 1 as well.

lines 917, 1680 and 2393. num_output should be set to number of your categories (number of your folders in image folder) deploy.prototxt: line 2141, num_output as above solver.prototxt, what is important below: display: 20 – print statistics every 20 iterations

base_lr: 0.0005 – learning rate, it’s a subject to change. You’ll be observing loss value and adapt base_lr regarding results (see strategy below)

max_iter: 200000 – how many iterations for training, you can put here even million.

snapshot: 5000 – how often create snapshot and network file (here, every 5000 iterations). It’s important if you want to break training in the middle. Almost ready to train… but you need googlenet yet. Go to caffe/models/bvlc_googlenet and download this file (save it here) http://dl.caffe.berkeleyvision.org/bvlc_googlenet.caffemodel Really ready to train. Go to the working folder and run this command:

../../build/tools/caffe train -solver ./solver.prototxt -weights ../bvlc_googlenet/bvlc_googlenet.caffemodel

It should work. Every 5000 iterations you’ll get snapshot. You can break training then and run deepdream on your net. First results should be visible on inception_5b/output layer.

To restart training use a snapshot running this command:

../../build/tools/caffe train -solver ./solver.prototxt -snapshot ./MYNET_iter_5000.solverstate

Strategy for base_lr in solver for first 1000 iterations. Observe loss value. During the training it should be in average lower and lower and go towards 0.0. But:

if you see ‘loss’ value during training is higher and higher – break and set base_lr 5 times less than current

if you see ‘loss’ stuck at near some value and lowers but very slowly – break and set base_lr 5 times more than current

if none of above strategy works – probably you have troubles and train failed. Change image set and start over.

Some insights:

My GPU is NVIDIA GTX960 with 4gb RAM. Caffe is compiled with CuDNN library. Every 5k iterations take about 1 hour. On CPU calculations were 40 times slower. I usually stop after 40k of iterations. But did also 90k. DeepDickDream guy made 750k iterations to have dicks on Hulk Hogan (http://deepdickdreams.tumblr.com/) You have little chance to get your image set visible in deepdreams images before 100k of iterations. But you have high probability to see something new and no dogs Fine tune method means that you copy almost all information into your net from existing net except the layers called “classification” (three of them). To achieve this just rename layer name. You may decide and try not to copy other layers (eg. all inception_5b) to clean up more information. I have good and bad results using this method. To do this just change name of such layer. You can put more images into one category. You can use 20 categories with 200 images in it. You can use random images, same images, similar images, whatever you want. I couldn’t find a rule to get best results. It depends mostly on content I suppose.

Examples:

Net trained on british library images (https://www.flickr.com/photos/britishlibrary/albums ). 11 categories. 100k images total (every image had 10 variants). inception layer 3, 4 and 5 were cleaned up. Result after 25k iterations. Faces from portrait album are clearly visible. This was my first and best try (among 20). Image from 5b/output layer.

Only letters album from british library images. 750 categories with one image in category. 40k iteration. Only classification layers cleaned up. Butterflies visible, but why? Layer 5b.



94 categories, 100 images each, porn image set. 90k iterations. default net set (only classification). Layer 5b.

As above but differently prepared image set (flip/flop, rotations, normalization/equalization, blur/unsharp, etc.). 40k iterations. Layer 5b.

Caffenet, built from scratch on 4 categories, 1000 glitch images each. 65k iterations. Pool5 layer.

Same image set as above. 80k iterations. googlenet with cleaned up 3, 4 and 5 layers. 5b layer.

Flickr collection 1

Flickr collection 2

Any questions? Comment it or write generateme.blog@gmail.com