Let’s say you’ve just sketched a portrait but don’t have the inclination to hand-colour your creation. Wouldn’t it be nice if you could simply tell an artificial intelligence assistant what colours you want and have it do the rest? Well, now you can!

A research group from the Seoul National University recently introduced Tag2Pix, a generative adversarial network (GAN) approach for line art colourization. Given a text sentence that includes tag information such as “blue_hair” or “brown_eyes,” Tag2Pix can convert monotone line art into a colorful picture. The researchers’ associated paper as been accepted to ICCV 2019.

Example of tag-based colourization

Current approaches for automating line art colourization can use a couple of different methods. One is “user-guided colorization,” which outlines a specific target area in the image and has the model naturally fill in the spaces. A popular example of this is the AI-powered PaintsChainer project, which can either pick its own colours for processing or follow users’ prompts on how to proceed.

Another approach is the “style-transfer method,” which uses a different sample image as a hint for the generative network. The target image output is then generated by following the color distribution style of the sample image.

User-guided colorization however tends to require human experts to adjust the inputs, while style-transfer requires sample images. Moreover, both methods are expensive.

The researchers’ tag-based colourization technique is a cheaper alternative that does away with human experts and sample images and requires only minimal and simple human input to deliver high-quality colourizations.

Major features and innovations in the research paper include:

a Tag2Pix dataset that contains color illustrations, monotone line art, color invariant tags (CITs), and color variant tags (CVT).

that contains color illustrations, monotone line art, color invariant tags (CITs), and color variant tags (CVT). a Tag2Pix network as a variation of an auxiliary classifier GAN (ACGAN) that contains a CIT feature extractor, image generator, CVT encoder, and guide decoder.

as a variation of an auxiliary classifier GAN (ACGAN) that contains a CIT feature extractor, image generator, CVT encoder, and guide decoder. Squeeze and Excitation with Concatenation (SECat), a novel network structure that enhances multi-label segmentation and colorization. This method helps to color even small areas such as eyes.

a novel network structure that enhances multi-label segmentation and colorization. This method helps to color even small areas such as eyes. Two-step training with changing loss, a novel loss combination and curriculum learning method for the Tag2Pix network. This method divides the learning focus between segmentation and colorization in order to train the network in a stable and speedy manner.

In the table below the line art in each row was colourized using two common CVTs and one different CVT as input to demonstrate Tag2Pix’s capability for colorizing line art naturally with various color tag combinations.

Colourization results

The researchers also conducted user studies comparing Tag2Pix to other networks for sketch-based and text-based colorization. They asked 20 people to evaluate various outputs on a five-point lineart scale over four categories: Colour Segmentation, Colour Naturalness, Colour Hints Accuracy, and Overall Quality. Tag2Pix received the highest scores across all categories.

User survey of sketch-based colourization networks

The paper Tag2Pix: Line Art Colorization Using Text Tag With SECat and Changing Loss is on arXiv. The research group has open sourced their code, pretrained network and other related resources on Github.