The human visual system has evolved to recognise people in almost any pose under a vast range of lighting conditions. This capability is so refined that it works not just for realistic images but also for artistic representations of the human form in oil paintings, cartoons and line drawings, even when these representations have little in common with the real thing.

The representations of the human form reach their artistic limit in abstract art when artists intentionally push the boundaries of human visual perception, sometimes beyond recognition. Nevertheless, humans are often able to perceive the most tenuous of links between distorted shapes and the human form.

In recent years, researchers have developed machine vision algorithms that have begun to match the ability of humans to recognise people in a wide variety of lighting conditions. Indeed, we recently reported the first face recognition algorithm capable of outperforming humans.

That raises an interesting question, say Shiry Ginosar and buddies at the University of California, Berkeley. Today, these guys ask whether the same algorithms can also detect people in Cubist art, in particular the work of Pablo Picasso. And the answer throws light not only on the limits of computer vision but also on the way in which human object recognition fails at its limits.

Cubism was one of the most influential art movements of the 20th century, pioneered in particular by George Braque and Picasso, perhaps the most famous and accomplished artist of the 20th century. Cubism seeks to represent three-dimensional objects on a two-dimensional plane by juxtaposing snapshots from different angles. The result is that a Cubist picture contains many ‘fragments of perception’ of the same object.

Ginosar and co began with a set of 218 Cubist pictures by Picasso with titles indicating that they depict people. They then asked a group of 18 participants to rate the degree of abstraction in each picture on a scale of 1 to 5, with pictures such as “Seated woman 1921” being rated as 1— very life-like—and works such as such as ”Nude and Still Life 1931” being rated as 5— not at all life-like.

Next, they asked the same 18 participants to draw a box around each human figure they perceived in each picture. This creates a kind of ground truth database against which computer vision algorithms can be compared. Each person annotated 146 randomly chosen paintings out of the total of 218, so that every painting was assessed by 14 or 15 individuals.

Computer scientists have developed a number of different approaches to computer vision. One question that Ginosar and co wanted to study was how these approaches compare when it comes to abstract images. So the team measured the performance of four different computer vision algorithms, which all learn by studying large databases of images of human figures.

The oldest approach, known as the Dalal and Triggs method, evaluates pictures by measuring the orientation of edges within an image and counting the frequency of each orientation. This accurately finds objects of similar shapes but does not cope well with the changes in pose that are common in images of human figures.

To cope with changes in pose, computer scientists have developed another approach that divides objects into collections of parts. The algorithm then looks for objects which contain the same parts, albeit arranged in different ways. This helps to cope with the problem of changes in pose. Ginosar and co point out that these models often learn to recognise seemingly odd parts of the body, such as half a face. These algorithms are known as deformable part models.

To make these models more accurate, computer scientists have also trained these kinds of algorithms to look for realistic combinations of annotated body parts, called poselets. For example, a poselet might consist of “half of a frontal face and the left shoulder”. So Ginosatr and co also included a Poselet algorithm.

The final algorithm they considered was based on deep convolution neural networks, an approach that has revolutionised the accuracy of facial recognition in recent years. This uses large amounts of data to learn what to look for in images of humans but does not depend on an explicit understanding of the different parts of the body.

Having trained all of these algorithms using a dataset of natural images of human figures, Ginosar and co then gave each algorithm the set Cubist images to study. The goal for each algorithm was to draw a box around the area of the picture containing the human figure, just as the human participants had done.

The results provide a fascinating insight into the capabilities of computer vision algorithms but also into the nature of human perception.

Unsurprisingly, humans are much better than any of the algorithms at spotting figures in Cubist paintings. Ginosar and co so they can recognise figures with a precision of 0.804.

But interestingly, the best algorithm was not the deep convolutional neural network that has revolutionised object recognition in computer vision. This does the job with a precision of 0.315.

By contrast, the deformable parts model recognises human figures with a precision of 0.444. That is considerably better. What’s more, the performance of this algorithm degrades in the same way as human performance as the images become more abstract.

Humans are pretty good at detecting figures in mildly abstract images but their performance deteriorates gradually as the images become more abstract. The deformable parts model behaves in the same way, working well for mildly abstract images and becoming increasingly inaccurate for greater abstractions.

By contrast, the algorithms that do not rely on deformable parts suffer a catastrophic drop in performance as abstraction levels increase. The Dalal and Triggs method performs particularly badly in this respect but the deep convolutional neural network also degrades badly.

That’s interesting because neuroscientists believe that humans perceive objects based on their parts. “The ability to model these mechanisms computationally further corroborates the neuroscience theory of part-based object detection strategies,” say Ginosar and co.

In other words, humans probably perceive objects in the same way as deformable part models rather than in the same way as today’s best deep convolutional neural networks.

That’s fascinating. Computer vision algorithms aim to match and even beat the performance human vision. But an equally important part of human perception is its flexibility and robustness. So a crucial aspect of computer vision ought to be its flexibility and robustness in extreme conditions, such as those presented in abstract art.

For this reason, abstract art may have an important role to play in assessing the performance of computer vision algorithms in future. “We have argued that object detection under abstraction of object form is an example of a challenging perception task that existing image benchmarks do not properly evaluate,” say Ginosar and co.

And they have ambitions to take this further. “There are other artistic movements with characteristic abstractions, such as the use of blurring in Impressionism, that would provide rich grounds for study,” they say.

We will be watching to see what computer vision algorithms make of the work of Monet, Manet and Cezanne.

Ref: arxiv.org/abs/1409.6235 : Detecting People in Cubist Art