From Hayao Miyazaki’s Spirited Away to Satoshi Kon’s Paprika, Japanese anime has made it okay for adults everywhere to enjoy cartoons again. Now, a team of Tsinghua University and Cardiff University researchers have introduced CartoonGAN — an AI-powered technology that simulates the styles of Japanese anime maestri from snapshots of real world scenery.

Anime has distinct aesthetics, and traditional manual transformation techniques for real world scenes require considerable expertise and expense, as artists must painstakingly draw lines and shade colours by hand to create high-quality scene reproductions.

A real-world train station scene (left) transformed to a cartoon-style picture (right).

Meanwhile, existing transformation methods based on non-photorealistic rendering (NPR) or convolutional neural networks (CNN) are also either time-consuming or impractical as they require paired images for model training. Moreover, these methods do not produce satisfactory cartoonization results, as (1) different cartoon styles have unique characteristics involving high-level simplification and abstraction, and (2) cartoon images tend to have clear edges, smooth color shading and relatively simple textures, which present challenges for the texture-descriptor-based loss functions used in existing methods.

CartoonGAN is a GAN framework composed of two CNNs which enables style translation between two unpaired datasets: a Generator for mapping input images to the cartoon manifold; and a Discriminator for judging whether the image is from the target manifold or synthetic. Residual blocks are introduced to simplify the training process.

To avoid slow convergence and obtain high-quality stylization, dedicated semantic content loss and edge-promoting adversarial loss functions and an initialization phase are integrated into this cartoonization architecture. The content loss is defined using the ℓ1 sparse regularization (instead of the ℓ2 norm) of VGG (Visual Geometry Group) feature maps between the input photo and the generated cartoon image.

An example of a Makoto Shinkai stylization shows the importance of each component in CartoonGAN: The initialization phase performs a fast convergence to reconstruct the target manifold; sparse regularization copes with style differences between cartoon images and real-world photos while retaining original contents, and the adversarial loss function creates the clear edges.