Can a computer draw?

Teaching a computer to sketch objects using GANs

Last week my little brother introduced me to skribbl.io, an online Pictionary. One person draws a chosen word and the others have to guess it.

After some fierce battles and lots of laughs, I wondered whether a computer could be more effective at guessing words compared to a human. That’s when I came across Quick, Draw!, a game where a neural net tries to guess what you’re drawing, developed by Google Creative Labs. The doodles created by more than 15 million players are open-sourced for anyone to play with. Let’s explore the data.

Get the data

The Quick, Draw! Dataset is publicly available on Google Cloud Storage, where you can find more than 50 million drawings across 345 categories. Using gsutil, you can locally download and explore the drawings. A simplified version of the dataset is available containing only the necessary information. Each category is a ndjson file, where each line is a json containing the information of a single drawing. Let’s look at how the drawing is represented:

[

[ // First stroke

[x0, x1, x2, x3, ...],

[y0, y1, y2, y3, ...]

],

[ // Second stroke

[x0, x1, x2, x3, ...],

[y0, y1, y2, y3, ...]

],

... // Additional strokes

]

In this formatting, x and y are real-valued pixel coordinates of the sketch. In order to get the sketch, we need to “draw a line” between all those pixel coordinates. We can do so using Bresenham’s line algorithm, which approximates the points contained in the straight line between every two points in the stroke we are drawing on our blank canvas.

At this point, we are able to display some of the sketches. Here are a couple from the airplane and cat categories.

The drawings are contained in a 256 x 256 canvas but we input each of them in our model as a 28 x 28 canvas. 784 pixels are enough to capture the meaning of our drawing and they allow us to be computationally more efficient.

Teaching the computer to draw

Let me introduce to you Generative Adversarial Networks (GANs), a class of machine learning frameworks used to learn to generate new data with the same statistics as the training set.

The generative model G captures the data distribution of the training data, trying to trick a discriminative model D that estimates the probability that a sample came from the training data rather than G.

You can now imagine how we can apply this framework to our problem. The generator will produce drawings while the discriminator will try to spot the fake from the real ones. This is a zero-sum game where D and G contest each other optimizing their respective utility function. The result is a neural network (G) that is able to produce drawings that strive to be similar to the training data.

The architecture of the model we are going to use is available here and was introduced in this paper written by Ian Goodfellow and colleagues in 2014.

Let’s sketch

At this point, we create a script to load our dataset of drawings and we are ready to train our model.

Starting simple: draw a circle

0–500–5000–30000 epochs

At first, the model is clueless and the output is basically random. After 500 epochs, we can see that the shape is vaguely visible and once we reach 5000 epochs, we can clearly recognize a circle. After 30'000 epochs, the shape appears to be very distinctive and the noise is also disappearing.

Level up: draw an airplane

0–5—10–15–30–40–50 epochs (thousands)

Ultimate challenge: draw a cat

0–1–5–10–20–30–40–50 epochs (thousands)

It’s your turn

This concludes our creative experiment of which the source code is available here. I encourage you to explore this dataset and come up with your own creative project. You can also check a collection of other AI Experiments from Google’s Creative Lab for more inspiration. Everything you need is just a click away.

Feel free to reach out to me directly for any questions or thoughts.