MangaGAN

Teaching computers to draw new and original manga and anime faces with DCGANs

Manga and anime faces generated with a model trained for 100 epochs

Manga and anime are appreciated around the world for their intricate art styles and compelling stories. The fan base for this is so massive there are thousands of artists out there drawing original manga and anime characters, and also thousands who are tempted to create them. However drawing takes tremendous time, effort, skill, and creativity. Generating manga and anime characters can help bridge the skill gap and provide opportunities to create custom characters. I eventually want to implement my own GAN to draw industry standard manga and anime characters, but for the purpose of this project I wanted to learn more about GANs and challenge myself to draw the best quality images I can since realistically humans are exposed to a number of eclectic styles.

Real samples, some containing false positives before data cleaning

The video game industry is the first area of entertainment to start seriously experimenting with using AI to generate raw content. Aside from the current overlap between computer gaming and machine learning, there’s definitely a huge cost incentive to invest in video game development automation given the $300+ million budgets of modern AAA video games.

Data Prep

The images were trained with dataset of approximately 143,000 images. It is well understood that image dataset in high quality is essential, if not most important, to generate images of industry standards. The images used were crawled from Danbooru — an image board with access to a large number of enough images for training image generation models. These image boards allow uploading of images highly different in style, domain, and quality, and believe that it is responsible for a non-trivial portion of quality gaps between the generation of real people faces and anime character faces. The quality of the images are so diverse that they can be abstract, so I hope to continue this project by producing cleaner datasets with unique styles.

After scraping the images I used python-animeface to crop the images. This image cascade specifically used for detecting anime and manga faces will work just as well too. This process was multithreaded with 12 pools, but here’s essentially what happened with each image:

import animeface

from PIL import Image



img = Image.open('data/anime_image_usagi_tsukino.png')

faces = animeface.detect(img)

x,y,w,h = faces[0].face.pos

img = img.crop((x,y,x+w,y+h))

img.show()

I manually checked for false positives since some images. I ended up removing 3% worth of false positives.

The Goal of Using a Deep Convolutional Generative Adversarial Networks (DCGAN)

Why exactly are AI researchers building complex systems to generate slightly wonky-looking pictures of cartoon faces?

The cool thing about this is that it takes an understanding of these pictures to generate them — just as artists need to understand what it takes to draw a face before they draw it.

Look at this picture:

An anime face. Specifically, Misty from Pokémon, whose one of my favorite anime characters.

For those of you who watch anime you instantly know this is a picture of an anime face — a cartoon face with an art style that originates from Japan. To a computer this is just a grid of numbers representing pixels colors. It doesn’t understand that this is supposed to be an anime face, let alone representing any concept.

What would happen if we showed a computer thousands of these Japanese styled cartoon faces, and after seeing those pictures, the computer was able to draw original anime faces — including faces with different hair styles, different eye color, different genders, different perspectives to the point where an artist could ask it to draw specific kinds of faces, for example “manga or anime girl with short blue hair, glasses and cat ears looking up with a smile on her face.”

If computers were able to draw these faces with proportional facial features, that means that they know how to draw new characters without any explicit directions (at least at a child level).

As an developer and artist, it’s exciting to see researchers pursue this. These generative models have gotten computers to understand data in a way that can be translated into a never before seen concept, without understanding the meaning of that concept. We’re still in the early days of machine-learning-based generative models and their practical uses are currently pretty narrow, but they are a lot of fun to play around with. It will be interesting to see if AI can get to the point where it can contribute to the arts or the entertainment industry.

“What I’m looking for out of cognitive systems is not just another form of computing but something that actually creates a presence in our life and through that presence is able to inspire us.” — Rob High, Vice President and CTO, IBM Watson

How does the model work?

A DCGAN is made up of two neural networks: a generator and a discriminator. It’s a battle to see which one can out do each other, resulting in both networks becoming stronger.

Let’s pretend that this first neural network is a lead animator who’s reviewing a pitch for a new anime. To prevent a lawsuit it’s trained to spot drawings that have been seen in previous shows or books. It’s job is to look at faces and see if it contains a new face that’s suitable for their show.

A Convolutional Neural Network works in this instance since all we need to do is take apart an image, break it up into layers and process each layer, recognize complex features in an image, and output a value indicating if theres a real anime or manga face in the image. This first network is the discriminator:

Discriminator Network

Next let’s pretend that this second network is a brand new animator who just learned how to draw fresh anime faces so that no one gets sued for copyright infringement. This animator is about to pitch a new idea for a show to a lead animator of a big animation company. The layers are reversed in this second network into a normal ConvNet. Instead of taking a picture and outputting a value like the first network, this network takes in a list of values and outputs a picture.

This second network is the generator:

Generator Network

Now we have a lead animator (the Discriminator) looking for reused faces, and a new animator (the Generator) that’s drawing new faces. It’s time to duel!

Round One

The Generator will draw an…interesting copy of an anime face that doesn’t resemble a new, industry quality face because it doesn’t know what an anime face looks like:

The Generator makes the first (…interesting) manga/anime face

But right now the Discriminator is equally terrible at it’s job of recognizing drawings, so it won’t know the difference:

The Discriminator think’s it’s a fresh, industry quality face. Maybe for a YouTube show

Now we’ll have to tell the Discriminator that this is actually a face that’s not drawn to our standards. Then we show it a real, complete anime face and ask it how it looks different from the fake one. The Discriminator looks for new detail to help it separate the real one from the fake one.

For example, the Discriminator might notice that a complete anime face has certain proportions. Using this knowledge, the Discriminator learns how to tell the fake from the real one. It gets a tiny bit better at its job:

The Discriminator levels up! It now can spot very bad fake faces

Round Two

We tell the Generator that it’s anime and manga images are suddenly getting rejected as fake so it needs to step up it’s game. We also tell it that the Discriminator is now looking for specific proportions on faces, so the best way to confuse the Discriminator is to draw these features on the faces:

The Generator produces a slightly better drawn manga/anime face

The fake faces are being accepted as valid again! Now the Discriminator has to look again at the real face and find a new way to tell it apart from the fake one.

This back-and-forth game between the Generator and the Discriminator continues thousands of times until both networks are experts. Eventually the Generator is producing new, near-perfect faces and the Discriminator has turned into a master anime and manga connoisseur and critique looking for the slightest mistakes.

At the point when both networks are sufficiently trained so that humans are impressed by the fake images, we can use the fake images for any purpose.

Results

I trained my network at 100 epochs and managed to draw some new faces with unique styles. I’m continuing to work on my implementation and plan to experiment with DRAGAN and different datasets to improve the quality of the generation.

Looking at the first four epochs the output from the generator is close to pure noise. The new faces slowly stat to take shape as the Generator learns to do a better job:

As the training went further, the Generator would draw more even proportions with each characters. Below are some results generated using IPython with the trained model:

Image interpolation by changing the latent z vector

Since the images came with tags of their own it would be fun to run the network keeping specific attributes in mind. I ran the network again using different hair colors as parameters. This time, the Discriminator is using hair color to tell faces apart from one another, and the Generator is taking this into consideration when redrawing the faces: