University of California, Berkeley

Data-mining can now help us study anything from online behaviour to phone call patterns. But there's one area it hasn't covered -- photographs. And now a team of academics at the University of California, Berkeley is looking to change this.

Shiry Ginosar is a PhD student who has pioneered a new technique for mining data in old photographs. The medium is not commonly used in data mining exercises due to its huge breadth -- there are thousands and thousands of pictures from 150 years of photography -- and because the information they contain can be hard to distill. And describing the content of photographs in words can be time consuming or mundane.


Ginosar downloaded around 150,000 yearbook portraits and, after removing photos that weren't face-on, was left with around 37,000 images from 800 yearbooks in 26 US states. The photographs were grouped by gender, and then decade, and superimposed onto one another to create an 'average' face -- revealing average period features like hair, clothing and facial expressions.

University of California, Berkeley

Looking at the photographs, the most striking feature of the decade-long progression is the evolution of the smile, which gets broader as the years pass. This, Ginosar argued, is down to a cultural hang-up from the birth of photography. Before instant photography, subjects were expected to sit very still for a long time, holding a neutral expression.

"Etiquette and beauty standards dictated that the mouth be kept small -- resulting in an instruction to "say prunes" (rather than "cheese") when a photograph was being taken," she said. "These days we take for granted that we should smile when our picture is being taken."


The results also highlighted the efficiency of data-mining in the analysis of photography. Another finding backed up previous research findings -- that women smile more than men -- but in a far more efficient way.

Previous findings were the result of hours of manual analysis. But Ginosar's team were able to come to the same conclusion with "virtually no manual effort".

Researchers acknowledged the data set was biased -- the number of Americans graduating from high school increased by more than 40 percent between 1900 and 1960, and the African American population was not represented until the mid-20th century. But the methodology is an interesting, as-yet untapped resource for those interested in photographic data.