If the great Charlie Chaplin were alive today he might well marvel at the artificial intelligence enabled colour versions of his comedy masterpieces such as Modern Times or The Idle Class. A recent research paper from Hong Kong University of Science and Technology presents a fully automatic method for colourizing black-and-white films without any human guidance or references.

Film/video colourization is not a new technology. A small number of early 20th century films, such as A Trip to the Moon (1902) and The Kingdom of the Fairies (1903) were painstakingly hand-coloured, frame-by-frame, by humans. Computerized colourization was invented in the 1970s and has been widely used and steadily improved ever since.

Typically, video colourization involves first using image colourization techniques on individual video frames, then knitting them together. This method presents a couple of challenges: recovering the ground-truth colours in the film/video scene from a grayscale image; and ensuring colourization remains coherent and consistent across the approximately 150k frames in a 90 minute movie.

Typical image colourization methods require some sort of labeled references, specifically propagating the provided colour from a reference image or user sketches to the whole grayscale image. Although the deep learning research community has enabled fully automatic image colourization, there has been no corresponding breakthrough in fully automatic video colourization. That motivated the paper authors to explore this research direction.

A key innovation of this paper is a novel framework consisting of a colourization network with self-regularization techniques to improve colourization quality by propagating colours between similar pixels of a video frame; and a refinement network designed to make video colourization more consistent by enhancing temporal consistency between different frames. Below is a comparison between colourized videos with and without self-regularization.

The paper authors also suggest an optimal video colourization solution should be able to generate a diverse set of colourized versions. So they adopted the ranked diversity loss function proposed in a CVPR 2018 paper to differentiate different solution modes. Below are four frames of three different videos colourized by this approach with diversity.

To test the efficacy of the method, the researchers conducted experiments on the DAVIS dataset and the Videvo dataset. They compared their model with two other state-of-the-art, fully automatic image colourization approaches, which were both applied to colourize videos frame-by-frame. The qualitative results showed the new approach was preferred in 80.0 percent of comparisons on the DAVIS dataset and in 88.8 percent of comparisons on the Videvo dataset.

Below are the quantitative comparison results between different models.

The researchers believe their work on self-regularization and diversity can inspire future research in fully automatic video colourization and other video processing tasks. Downstream applications for this technique are not limited to moving image colourization, it could also be used in computer vision applications such as visual understanding and object tracking.

For more information, read the paper Fully Automatic Video Colorization with Self-Regularization and Diversity on arXiv.