I am not very passionate about art: when I visit a museum, I am a casual tourist that walks around observing paintings and sculptures, trying to learn as much as possible, unfortunately without appreciating the depth that is behind an art piece.

A few years ago I was in a cocktail party organized inside the Peggy Guggenheim Collection museum, in Venice, and I had the luck to see a Pollock painting: Alchemy, live for the first time. Jackson Pollock was an influential American painter, and the leading force behind the abstract expressionist movement in the art world [1]. Pollock is well known for his use of the Drip Painting technique a form of abstract art in which paint is dripped or poured on to the canvas [2].

I remember that I was really fascinated about the fact that, in some way, my mind was completely caught by a painting that was just made by some colors randomly dripped on a canvas. I probably realized in that moment (while drinking a very expensive Aperol Spritz), that I was not looking at randomness, but at something that was created to be beautiful.

This introduction about my superficial art knowledge, is just to explain why and when I’ve became curious about Pollock and, by consequence, why I decided to spend a few hours in doing the analysis below.

How did Pollock’s colors usage evolved through time? To answer I decided to do some experiments with clustering, applying a few algorithms and plotting some charts.

Data

To do clustering on Pollock’s paintings I needed a reliable source from which I could download the paintings and extract other data. A quick search on Google about the artist led me to this website (it’s not officially related to Pollock as a person). I scraped the paintings from the website, saving also the year and the name of each artwork.

Pollock created many masterpieces with very different sizes; I decided to retrieve the size of each painting: to understand the artist’s usage of color, it might have been interesting to include the canvas size in the mix. Unfortunately, this information was not available on jackson-pollock.org, so I needed to search each piece on Google… manually. I ended up having a csv file and a bunch of jpg images. Of course, my dataset contained only some of the Pollock's paintings and not all of his production; by the way, it's enough for what's coming next.

The Analysis

With all the information needed, we can now start to have some fun with Python.

First of all, we need to decide to either re-scale the images according to their original size (a.k.a. the canvas size). This is a crucial decision due to the fairly high variability of painting dimensions. We have at least two options:

Re-scale the images according to the original size : we will weight more the colors that are contained in bigger canvases (e.g. Autumn Rhythm (Number 30) is a 14 square meters piece). Doing so we would be evaluating “how many buckets” of each color Pollock used in his career.

: we will weight more the colors that are contained in bigger canvases (e.g. Autumn Rhythm (Number 30) is a 14 square meters piece). Doing so we would be evaluating “how many buckets” of each color Pollock used in his career. Re-scale the images to a fixed dimension, the same for each painting: every painting will have the same weight. What we would evaluate is the proportion of each color across the paintings. The smaller the images, the faster the algorithm the poorest the clustering results.

I think it’s more interesting to analyze the proportions, so we’ll go for the second route: we will reshape the images to a 200x200 pixels square.

The following code is a simple snippet that resizes the images in the desired shape (if the size parameter is None , then the re-scaling is done according to the data present in the csv file). The SCALE_FACTOR constant, helps to reduce the complexity of the clustering algorithms, setting it to 1. The actual re-scaling will be done using OpenCV.

As I wrote earlier, what we want to do is analyze how Pollock varied his usage of colors during his activity. Of course we can’t use every single color we find in the dataset (we could potentially end up with 16M+ possible values). What we’ll do, instead, is trying to perform a clustering to reduce the number of colors to plot; to do so, we are going to use k-means.

k-means aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster [3]. The algorithm tries to minimize the Euclidean distance between the cluster centers and the points in the cluster. In our case, the prototypes will represent the colors that we will track across the years: the intuition behind this choice is that k means will (?) group together similar colors, therefore, by tracking the prototype, we will be tracking many colors that are similar to it.

To follow a color across the years, we’ll need to perform the clustering considering all the paintings: if we’d run k-means on each single image we would end up with 54 different models and a number of different prototypes.

The first step, though, is to read every single image and stack them into a dataset. For the clustering task we are going to use the RGB (Red-Green-Blue) color space, so the dataset will have:

1 row for each single pixel in our images.

3 columns, one for the Red channel, one for the Green and another for the Blue.

Not the most efficient way to stack images, but it does the trick.

In the stacked_images variable, we have exactly the dataset that I described earlier.

Before fitting the model, we still need a preprocessing bit: we are lifting the images in RGB format and each channel value can range from 0 to 255, we’ll re-scale everything in the [0,1] range.

After this step, we should have identified the 20 main colors used by Pollock in his paintings! First of all, let’s see them.

To create the image below, I converted the pixels to the HSV color space (Hue-Saturation-Value) and I sorted the colors according to their Hue, Saturation and Value (in this order); the segment size shows the proportion of that color in (all) the paintings.