Transfer learning is a research problem in machine learning that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem. It can also be applied while solving the same problem using different but somehow related domains. Let us consider the source and target domains which differ in feature spaces representations. We would like to somehow unify the space using feature-based approaches to transfer learning.

Symmetric heterogeneous transfer learning aims to create common latent feature space for both domains. This latent space can be then used for learning of classifier. Presence of labels can be considered in both domains. The topic is close to representation/feature learning.

The aim of this research work is to experimentally apply state-of-the-art methods based on generative adversarial networks (GANs), and variational autoencoders (VAEs) to the symmetric heterogeneous transfer learning task and modify them in the way that the latent representation will be the best for the learning of the classification task.

“The coolest idea in deep learning in the last 20 years.” — Yann LeCun on GANs

Background

A lot of computer vision problems can be considered as image-to-image translation problems, mapping an image in one domain to a corresponding image in another domain. For example colorisation is one of those problems as it is necessary to map a grey-scale image to a corresponding color image. Those translation problems can be studied in supervised and unsupervised settings. Supervised setting offers pairs of images in corresponding domains which differs from unsupervised setting where the lack of this pairing causes significant increase in the task’s difficulty, but the usability is remarkably higher. We are focusing on the latter.

The key for this approach is to find a joint distribution for images in different domains with marginal differences. According to the coupling theory infinite set of joint distributions exists so this problem can have numerous solutions.

We focused our attention on UNsupervised Image-to-image Translation (UNIT) framework presented in paper Unsupervised Image-to-Image Translation Networks which is based on generative adversarial networks (GANs) and variational auto-encoders (VAEs).

“We model each image domain using a VAE-GAN. The adversarial training objective interacts with a weight-sharing constraint, which enforces a shared- latent space, to generate corresponding images in two domains, while the variational autoencoders relate translated images with input images in the respective domains.” [arXiv:1703.00848]

Figure 1: Model for image transfer

The model consists of two domain image encoders E1 and E2 with shared layer, two domain image generators G1 and G2 with shared layer, and two domain adversarial discriminators D1 and D2.

The encoders are responsible for mapping an input image to a code in latent space which is later sent to a generator that reconstructs the image. The discriminators are trained to differentiate between real and fake images whereas the generators are trained to fool them.

The complete description of the model and exact objective functions can be found in the original paper mentioned above.

First results

The structure of our work is based on weekly meetings that define next steps needed for achieving the desired results in the long run.

Main task in the initial phases was to replicate the UNIT framework which was presented in the original paper Unsupervised Image-to-Image Translation Network in Tensorflow 2.0 and reproduce the described translations. We tested the newly created model on various datasets including orange2apple, summer2winter, and horse2zebra.

Figure 2: Transfer from an orange to an apple

Size of the datasets proved itself to be an issue since the model required a lot computing resources and that caused slow learning progress. In order to accelerate the process we used only parts of the datasets to train the model. Results (examples shown on following figures) ensured us that the model was working correctly and we were able to match the transfers proposed in the original work. All images presented in this article were produced by our networks.

Figure 3: Winter to summer transfer

Figure 4: Summer to winter transfer

Next steps followed shortly as we redesigned the model accordingly to the description in the original paper to create transfers between images of numbers from MNIST and USPS datasets. The performance of the model was measured by classifiers trained on the original images. Since the datasets are made of greyscale 28x28 pixel images we were able to use them in their entirety and augment the precision of the model. We achieved even better classification accuracies then those predicted by the original paper.