In image colorization, our goal is to produce a colored image given a grayscale input image. This problem is challenging because it is multimodal -- a single grayscale image may correspond to many plausible colored images. As a result, traditional models often relied on significant user input alongside a grayscale image.

Recently, deep neural networks have shown remarkable success in automatic image colorization -- going from grayscale to color with no additional human input. This success may in part be due to their ability to capture and use semantic information (i.e. what the image actually is) in colorization, although we are not yet sure what exacly makes these types of models perform so well.

Before explaining the model, we will first lay out our problem more precisely.

The Problem¶

We aim to infer a full-colored image, which has 3 values per pixel (lightness, saturation, and hue), from a grayscale image, which has only 1 value per pixel (lightness only). For simplicity, we will only work with images of size 256 x 256, so our inputs are of size 256 x 256 x 1 (the lightness channel) and our outputs are of size 256 x 256 x 2 (the other two channels).

Rather than work with images in the RGB format, as people usually do, we will work with them in the LAB colorspace (Lightness, A, and B) . This colorspace contains exactly the same information as RGB, but it will make it easier for us to separate out the lightness channel from the other two (which we call A and B). We'll make a helper function to do this conversion later on.

We'll try to predict the color values of the input image directly (that is, we do regression). There are other fancier ways of doing colorization with classification (see here), but we'll stick with regression for now as it's simple and works fairly well.