Before there was colour film, moviemakers crafted their visions in monochrome. The classic black-and-white films of yesteryear took us on incredible journeys and still resonate with audiences today. But although these masterpieces have stood the test of time, the media they were captured on has not.

Effective restoration, especially of culturally significant films, has become an area of interest for AI.

Researchers from the University of Tsukuba and Waseda University in Japan recently introduced a single framework designed to tackle an entire range of remastering tasks for films that have been converted to digital data. In the paper DeepRemaster: Temporal Source-Reference Attention Networks for Comprehensive Video Enhancement, the researchers show how their novel approach can bring new life to old films.

The method is built on fully convolutional networks. Although many recent studies have used recursive models for video processing, the researchers chose instead to use temporal convolutions to process video frames, as these can take account of information from multiple input video frames at once.

The researchers propose using source-reference attention as an attention mechanism, enabling the model to handle user-provided colour reference images. The model can then choose what reference frames to use when colouring the output frame, and also decide what regions of the reference frames to apply to the output region. The significant benefit of using source-reference attention-based networks is the model can use all the reference information when processing individual frames. This computational efficiency enables an almost completely automatic vintage film remastering process.

The researchers employed data creation and augmentation to train their model to not only perform colourization and noise removal, but also increase resolution and sharpness and improve contrast with temporal consistency.

The model contains two trainable components: a preprocessing network and a source-reference network. The preprocessing network is responsible for removing artifacts and noise from an input monochrome video. Researchers designed it by adopting temporal convolution layers. Because temporal convolutions will increase learning complexity and compute burden, they used a mix of temporal and spatial convolutions to build the source-reference network.

Researchers found that the joint training of restoration and colorization models plays an important role in improving the quality of the final results, where the approach outperforms existing pipeline-based approaches.

The researchers point out that the source-reference attention mechanism’s demands on system memory can limit processable resolution size. But since most vintage movies are already stored at low resolutions, it is still possible to use an attention-based mechanism to remaster them.

Another limitation to the approach is that it cannot fill in missing frames nor repair extreme degradation such as a large image regions missing over many frames — challenges that are quite common in vintage film restoration.

Huge archives of vintages films are owned across the publishing and media industries. The researchers believe their approach can offer solutions to help in efficiently remastering these libraries.

The paper DeepRemaster: Temporal Source-Reference Attention Networks for Comprehensive Video Enhancement is available on iizuka.