In my last article I analyzed sample water sensitive papers based on Blob detection, size and number of droplet impacts. In this one I will use another scikit functionality, comparing textures and trying not to assume which feature is most important to compare droplet impressed water sensitive papers.

First things first

I’ve build a dataset from http://sprayers101.com/water-sensitive-paper-for-assessing-spray-coverage/ using different spray pressures and nozzle diameters and I am loading it. I’ll use Pathlib to read through the image dataset directory, transforming RGB images in multidimensional arrays.

First approach

My first approach will be to flatten the image arrays and use it as vectors of a n-dimensional space. Then I will use Principal Component Analysis (PCA) to reduce dimension number to 2.

Each image is now a two-dimension vector which I can scatter plot with the help of Altair (I am starting to test this more declarative alternative to Matplotlib).

Considering that fine_12 is the best coverage:

I’ve noticed (as expected) that the “second best” (and closest) is fine_8, but the third one is medium_12. Fine_4 is not clearly separated from other several alternatives.

Second aproach

I’ve decided also to compare images using Haralick features. According to Wikipedia they are based on the Co-occurence matrix of a given image. Mahotas library does it for us, but before I will convert the images to gray scale and simplify it.

I’ll now reduce 13 Haralick features space to 2D space using PCA.

And plot the result as I did it before.

Although the results are slightly different, both approaches agree that the best alternative to fine_12 is fine_8. Nevertheless, simple PCA from raw images seems to give a more clear classification of water sensitive papers.