I’ve made my code and pre-trained model weights available at https://github.com/mosessoh/iconcolor.

How I envision this being used is that someone starts a side project, and needs an logo/icon for branding. Let’s say her side project is about sharing beautiful royalty free images (e.g. Unsplash). She grabs an icon she likes (e.g. this camera icon from the IconBros icon pack that went viral on Product Hunt), and submits it to the model, and voila — a fully colored and stylized icon. Note how the model uses some darker shades of orange at the sides to give it some visual depth, and adds a splash of green to really make it pop.

Why I think this might be useful

Beautiful icon outlines are easy to find online, relative to colored icons. This can help someone generate an original & unique colored icon for whatever project they need. It might not be as good as Yoga Perdana’s or Ivan Bobrov’s work, but I think it’s much better than using a generic solid icon.

How it works

There are more details in the poster below (I made that for CS229), but at a high level, this is modelled as a supervised learning problem. I take in a 1 x 128 x 128 grayscale icon and produce a 3 x 128 x 128 RGB icon which is then compared to a true RGB icon using some loss function during training. The model is a Convolutional Neural Network called a U-Net which I trained on an icon set from Smashicons (I’m a premium subscriber — there isn’t a more complete set of ). I taught the model to convert outlines to yellow style icons, but this model can learn to convert between arbitrary styles since nothing is hard-coded.

Poster for my CS229 final project

The top 3 things I learned from my experiments

Data augmentation helps a lot

I tried the per-pixel L1 loss first. It works pretty well in-sample, but when I tried it on out-of-sample icons that have dramatically different scales/outline styles, the model really struggles (e.g. some icon packs use much thinner lines than Smashicon’s). Here’s how my model performed on out-of-sample icons from linea.io. I think this a pretty good example of how these powerful deep neural networks can be also be quite fragile — they over-fit to certain aspects of the training data.

My model struggling with icons from outside of the training set

To improve generalization, I created a data augmentation pipeline to (1) rescale my input and output pairs, (2) reposition the icons randomly in their frame, (3) randomly blur or sharpen the input icon outline and (4) add noise to the input icon outline, so that the generator learns to deal with and become invariant to all of these. I did some ablative analysis on which of these was most helpful in improving model generalization that you can check out in the poster. I think this is especially important for icons where scale, position and line thickness is arbitrary and therefore very varied. With data augmentation, my model started generalizing much better.

Some heartening out-of-sample results after augmenting my training data

Color ambiguity is a hard problem to overcome

However, as you can see above, the model was still creating icons that were overly yellow. If you check out the Smashicon’s yellow-style icons, you’ll see they use beautiful splashes of red, green and blue to make their icons very eye-catching. This was happening because the L1 loss discourages the model from predicting underrepresented colors. Intuitively, if I penalized you for guessing a color wrongly, and the vast majority of the pixels are yellow, you’d always guess yellow unless you were absolutely sure green/blue/red was used there, which is very hard to do with icons since color choice is ambiguous and arbitrary. This is unlike previous research that deals with coloring in grayscale photographs, where colors are still ambiguous but there is a stronger prior (at least in my view!).

To tackle this problem, I trained a discriminator under a conditional GAN setup and made my generator minimize both L1 and adversarial loss. The discriminator learns how to differentiate between a real icon and a generated one, so it punishes the generator for always guessing yellow, since that’s not what the original icons look like. This improved color reproduction substantially (e.g. look at the rugby ball and media player below).

(Above) Results using generator trained with L1 loss (Below) With L1 loss + adversarial loss

Architecture matters

This was somewhat surprising to me — I naively thought that if you just threw enough layers at the problem, the deep networks would figure it out. Lest you start to believe this project was smooth sailing, check out what happened when I used a naive implementation of the Super Resolution ResNet to convert solid icons to yellow icons. Despairing, I started reading through a lot of the colorization literature and found the U-Net, with its skip-connections and encoder-decoder architecture. It made me realize that I need to gain a better intuition of how architecture choices affect the kind of functions a model can learn.

Results after trying to use a SuperResolutionResNet for colorization — this was after 1000 epochs. Can you imagine how sad I was when I woke up to this? 💔 The model can’t seem to understand where the edges are and how to make the background white. I thought I was a goner.

Next steps

I’d like to continue training the model on more styles so users have more options to choose from. I’m also exploring incorporating user hints to resolve the color ambiguity problem. I view user hints and adversarial loss as two different ways of forcing a model to make up its mind in ambiguous color situations. This means a user can add dots of color to the outline, and the model will figure everything else out based on what color distributions a designer usually uses.

I’m still exploring this problem so I’d love ideas, thoughts and comments. Feel free to drop comments below if you think this is something of interest to you. Thanks for reading 🙏