A major contributing factor to poor colourisation of old Singaporean photos could be the fact that the old Singaporean black and white images are too different from the training dataset. The model used by Algorithmia (created by Zhang et al) is trained using 1.3 million images from ImageNet — a commonly used image database created by researchers at Stanford University and Princeton University.

And as such, ImageNet is unlikely to have images relevant to Singapore. What this means is that the model is unlikely to have learnt what the colours of an old Singaporean schoolyard scene could plausibly be.

We hypothesise that a tool trained on Singapore-specific historical images will produce more believable colourised old Singaporean photos than existing tools.

How does one colourise a black and white image?

Before we jump into how colourisation can be done by a computer programme, let’s first consider how colourisation is done by a human colourist.

Colourisation is an extremely time and skill-intensive endeavour. In order to create an appropriately colourised photo, an experienced human colourist has to do two tasks:

(1) do significant research on the historical, geographic, and cultural context of the photo in order to derive appropriate colours, and

(2) colour the black and white image using software tools like Photoshop.

(This is of course an oversimplification of the work colourisation artists do — for a more detailed and accurate explanation, check out this great video by Vox.)

Similarly, a computer programme needs to perform the two tasks, albeit in a slightly different manner. A programme needs to:

(1) identify objects in a black and white photo, and figure out a plausible colour for the objects given images that it has seen in the past, and

(2) colour the black and white image.

Colourisation using Generative Adversarial Networks (GANs) — a deep learning technique

To colourise black and white images, we employed a technique in deep learning known as Generative Adversarial Networks (GANs). This comprises:

A first neural network — a ‘generator’ — with many mathematical parameters (> 20 million) that tries to predict the colour values at different pixels in a black and white image, based on features in the image, and

A second neural network — the ‘discriminator’ — that tries to identify if the generated colours are photo-realistic compared to the original coloured image.

The model is trained until the generator can predict colours that the discriminator cannot effectively distinguish as fake. A simplified view of the architecture used for training is shown below:

Simplified architecture of GANs for colourisation

We used the popular fast.ai and PyTorch libraries to develop our model, with an architecture and training steps inspired by Jason Antic (https://github.com/jantic/DeOldify). We trained our model based on a new set of more than 500,000 old, publicly available Singapore based images that we compiled, using a local GPU cluster with NVIDIA V100 GPUs.

Other steps we took to improve our model included adding images from Google’s Open Images V4, especially for body parts that our model did not seem to do too well on (e.g. hands, legs, and arms which were hard for the model to identify), and modifying learning rates and batch sizes for better results.

Deploying our deep learning model as a web application

At this point, our deep learning model lived in our office’s local GPU cluster — which meant that only our team had access to the colouriser model. In order for the colouriser to be useful to anyone outside our team, we had to deploy it on the internet.

We went with Google Cloud Platform as our cloud provider for the colouriser service. The architecture is fairly simple, with:

(1) a CDN offering DDoS protection and caching of static content,

(2) an NGINX frontend proxy and static content server,

(3) a load balancer that distributes traffic, and

(4) backend colouriser services with NVIDIA Tesla K80 GPUs that perform the actual colourisation.

The architecture diagram for Colourise.sg

The colourisation step is compute intensive and takes approximately 3 seconds to complete per image. As such, we decided to shield the backend colouriser services by using an NGINX server to queue requests to the backend. If the rate of incoming requests far exceeds the rate that our backend services can handle, the NGINX server immediately returns a status response to the client asking the user to try again later.

The key highlight of this architecture is that the colouriser service virtual machines (VMs) are autoscaled in response to how much traffic each VM has to service. This saves on cost because additional VMs are only switched on when there is demand for it.

Results

Here are some of our favourite results using photos obtained with permission from the New York Public Library and the National Archives of Singapore. We would like to note that our sources only provided us with the black and white photos and are not in any way responsible for the colourised output created by us.

Good results

Our model performs well on high resolution images that prominently feature human subjects (images where people occupy a large portion of the image) and natural scenery.

The following images look believable (at least to us) because they contain objects that exist in sufficient examples of the training image dataset. And so the model is able to identify the correct objects in the image and colour them believably.