I find colors really non-intuitive. The result of mixing yellow and blue is green. Sure… Another fascinating aspect of color is how sensitive our eyes are to detecting the most minute differences.

This picture for example, has 16 million colors, out of which we can differentiate about 10 million.

16 million colors! You can see 10 million. Have not checked. Source: wikipedia

We have an apparently exquisite ability to detect small differences. But how about proportions?

I have experimented with quantifying the summer 2017 collection in one of the world largest fast fashion houses. I did this by downloading about 7000 images and their corresponding descriptions, prices and materials, and quantifying this. Why do we need to do this? This is why:

Can you tell how many blue jeans this season has? Well about half. But what shades of blue? And how are they distributed? Lets find out.

The first step in this analysis is figuring out the dominant colors of an garment, or rather its image. There are many ways to quantify the dominant color of an image, one of which is k-means clustering (obviously after background subtraction). Here is an example of how this looks with 1, 2, or 3 average colors and compared to the original:

So lets compare the average 8 colors of jeans, cardigans and blazers for women and men:

Here we can see the clear similarities between colors and tones of color for jeans. The blazers on the other hand are clearly different in the choice of color (other than black) while the cardigans are again more similar between genders.

So how are colors combined? One very interesting way to visualize this is using circos plots. Circos plot are an amazing way to quickly visualize the pairwise similarity between several parameters, where the line thickness is proportional to similarity. The width of the line represents how many time the two colors are found in one single garment.

What we see here is that for both men and women, the light colors combine progressively with darker colors. Men have slightly more dark colors, and the dark colors contrast more with lighter colors in their combination. Lets do another:

Here we see a much more interesting phenomenon. Women seem to have much more chaotic connections, while men seem to have fewer and thicker lines, indicating that when it comes to cardigans, men want simpler color combos, while women like more complex combinations.

Up till now, we have looked at specific categories and their color usage. But how can we get a more systemic, birds eye views of this data-set? So lets cluster the similarity of all images based on their 3 main colors using an algorithm called tsne:

We see here that the blacks and reds clearly cluster together, and that the earthy brown tones go together with the blue. The light blues are a bit more spread out and end up close to both salmon, and to light blues. Remember, that this clustering is based on three main colors, but I am only coloring the main average color.

Can we improve this? Of course:

Here we are clustering in 3 dimensions instead of 2. Now the the colors are much more efficiently clustered perceptually.

One interesting analysis is keeping the spatial organization, but labeling each point with different information:

We see here that there are no clear clusters of black (men) or red (women), with the exception of:

There is a small subcluster of red colors, and this cluster consists almost exclusively of female garments. All the remaining items are basically both unisex colors. Put differently, colors do not really predict if an item is meant to be worn by men or women! Unless its red.

The same analysis but clustering for either: price, body position, image complexity, fiber type (mixed, synthetic, or natural) did not cluster in any obvious way. Meaning again that color does not really predict any of those parameters.

So what does gender correlate with? Price!

Violin plot showing distribution of garment price for men and women.

In this figure we are seeing the distribution of prices of all garments for men and women, where the width of the shape is proportional to the number of items at that price. We see that men have more cheap garments than women, both genders have about most of their garments priced in the same range, while women have more garments with the relatively higher price range.

What else does price correlate with? One interesting analysis is to look the image complexity, or how chaotic an image is. A more chaotic image should indicate that the item is more complex as opposed to a less chaotic image, which should indicate that the item is more simple.

Here we see how images (and by extension items) with low messiness, have lower prices, while the higher the complexity, the more wider the shape becomes as the price goes up. Put in another way: the more colors and shapes, the higher the price.

This is just scratching the surface of the available data to analyze, but you have to stop somewhere! In conclusion, we see that colors in images representing garments, are consistently different for different categories of clothes, that the garment color does not predict gender or other parameters, that women have more expensive clothes, and that more complex garments are more expensive.