https://research.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html

Deep Style isn’t just a photoshop. It’s a lot more intelligent than that, but like Photoshop it should be looked at as a tool for artists. The development of art is intrinsically linked to the development of technology. Impressionism came out of new scientific discoveries in optics. The invention of amplified electric instruments lead to rock’n’roll.Technology creates and defines the canvas, but we have to decide what to put on there.

I will admit that having been trained in a traditional oil painting fine arts tradition kind of has given me a certain amount of leeriness towards “cheating”. As I’ve gotten older though I learned about the long history of “mechanical reproduction” in art. Going back to things like the camera obscura, all the way to modern “photobashing” people have been finding ways to “cheat” and produce work faster and more accurately. Ever seen those super impressive videos on YouTube where people draw those ultra realistic portraits just by drawing across the board? That’s actually a mechanical reproduction technique not unlike tracing. For the longest time I couldn’t figure out how people could just move their hand linearly across the board filling in the details like a printer, but that’s actually part of the trickery of the time lapse.

So Deep Style can be used as a tool, like any other artistic tool. I wanted more flexibility than downloading an App with presets would give me, so I partitioned my computer hard drive, and installed Linux to run the neural network. You can download an implementation of the neural network on GitHub here:

https://github.com/jcjohnson/neural-style

I’m not a computer scientist. I’m an just an artist who finds tech interesting. Right now, neural networks are so niche that they’re mostly only used and understood by Silicon Valley engineers and computer science academics. I think the basic concept of a neural network is actually quite intuitive, it’s modeled after our own brain structure after all, but despite the intuitiveness of it, it still has a bit of a barrier to entry for non-technically inclined people. I hope by sharing my experiences and knowledge that I will help other artists understand neural networks and find cool ways to implement them in their own workflow.

One of the portraits I made for a friend

First a couple of terms:

Content image: This is the photo you want to transform. If I were trying to make a dog look like it had been painted by Monet, the dog would be the content image.

Style image: This is the image you are deriving the style from. So in the case of the previous example, Monet would be the style image.

The possibilities left to experiment with neural networks, even just Deep Style itself are massive, but I’ll focus on one particular set of techniques I’ve been experimenting with. I call these paintings “neural fusion paintings”

Basics of the technique: I run an image through a neural network under a bunch of different parameters, and using different source style images. Sometimes from the same artist, sometimes multiple artists. Then I layer those styled images over the original photo and use a layer Mask in Photoshop to selectively reveal and hide different parts of the style images. Then I have a final layer where I touch up some of the details and blend different parts of the image.

One of the portraits I made for a friend (Check out his game BattleSloths! Phil is awesome!)

This neural fusion technique is useful for a couple of reasons. First, I have a pretty hefty computer (I also do VR art and development), but even my computer has trouble processing these images. Neural networks take a massive number of simultaneous computations. This means they need a really good graphics card. I have a GTX 970 with 3.5 gigs of VRAM and that means I can get, with optimizations, output images that are about 600–800 pixels in width.

People have come up with some solutions, like slicing up the images into smaller pieces, and then applying the neural network to each piece and then sewing them back together, but this technique results in a “flat” look that loses a lot of the object recognition skills that make Deep Style so convincingly like a painting.

Another portrait for a friend

With the neural fusion technique though, some of the lack in resolution can be disguised through your manual blending layer in Photoshop. Another advantage to the neural fusion technique is that the handpainted detail layer in Photoshop can make up for some of the lack of information resolution that the neural network can access on a smaller graphics card. The bigger the graphics card, the higher res the image that it can hold in VRam, and therefore the more texture details it can extract from it. If you resize a picture of a person down to 100px by 100px then the photo is so small that you can’t really meaningfully extract any useful information about the facial recognition or text etc. It’s just a bunch of splotchy color squares. The apps you can get in the App store run on servers that are probably more powerful than your home computer, which means they can do some pretty rad stuff even if you don’t get the same control as a home installation.

A friend of mine, Fredrick Nolting has done some awesome stuff using the raw output from the apps though:

http://neeslow.tumblr.com/image/155254678031

There are lots of possibilities, and as GPU power goes up, the possibilities for artists will go up. This is a rich vein to explore. It won’t just be an Instagram filter, an automatic conversion of a photograph. It will provide a tool to transform visual material for the artist to sculpt and bend to their ends.

There are two aspects to “painterly” graphical representation that need to be taken into account. Noise and Specificity.

Real life is naturally noisey. Light and color have very complex interactions with the surface of a material and the way they bounce around a room. There’s lots of tiny variations from speck to speck. When we say that a piece of CGI look “fake” what we’re usually grabbing on to is an insufficiently complex simulation of light on the surface, resulting in a simplistic, homogeneous “plastic” reflection of light. (remember, polymers are basically just a repeating chain of molecules so the way light interacts with it is going to be fairly consistent compared to the way it would interact with a complex wabi sabi organic surface like skin).

“Un-noisey” rendering style might be preferable though! As argued by Scott McCloud, the emotional strength of art is often tied to it’s ability to simplify and abstract the world.

The process of running images through Deep Style naturally lowers their surface noise. Even though the resulting images can sometimes be harder to make out, or might look more chaotic, on a pixel by pixel level they’re generally more consistent. You can actually do this in Photoshop to some degree using it’s “Artistic” or “Facet” filters but this will end up looking like a filter because they lack the second element.

Specificity is in some ways at odds with decreasing noise. As you decrease the overall noise, you often decrease distinction, the sharpness dividing a foreground object and it’s background. It sort of smooths out details and makes everything a little “smudgey”. This makes a picture look less like a real painting because it looks programmatic. It essentially created without reference to the object depicted. A photoshop filter doesn’t know that it’s applying an Artistic filter to a face, or a dog, or a landscape. It just sees a grid of colors. Neural networks are able to bring greater specificity in the way they denoise and apply texture which creates a more accurate illusion of being painted by hand. It seems to carry with it intention rather than being the result of an anonymous algorithm.

The places that the neural network have the most difficulty with, and consequently suffer the most at lower resolution are in facial features and fine edges, areas of high specificity. This is another advantage of the neural fusion technique. Details that might otherwise get smudged out, like individual hairs, eyelashes, or pupils, can have their specificity brought back in through a hand painted detail Photoshop layer. Common issues are smudges between surfaces (a face might kind of “bleed” into the background behind it) or a rippling effect where there should be a consistent flat or curved surface (but the neural network doesn’t know that it’s a surface. While your eyes know what a hand is, and the curve of a hand, the neural network is just working with a bunch of pixels and trying to find the average) You can selectively reveal some of the original photo to help here but the texture and color will often not match properly and look a little weird because photographs are generally higher noise than the intended painted look. Using some of the Artistic filters in photoshop along with hand painting generally can help out with this issue if you do choose to use the photograph directly in the painting. Having figure drawing experience is a big help here, even if in many ways the bulk of the work is being taken care of for you by the neural network.

There’s more I could dive into here, like the technical details of setting up the neural network for non-technical people etc but I think I’ll bring this article to a close for now. I would love feedback! If you’re an artist working with neural networks, let me know! If there’s something unclear or that you’d like to know more about ask me!

Here’s a bonus gif to show how the layers are built up:

Below are some more of the portraits I’ve made: