Porngram

An interdisciplinary team of researchers has created a Google Trends-like tool called Porngram that maps the evolution of keywords in the titles of 800,000 porn videos.

It allows users to enter in any number of keywords to see how often they appeared in the titles of porn videos uploaded to porn site Xhamster between 2007 and February 2013. It allows you to see that movies tagged with "footjob" have been on the rise at the same time as those tagged "handjob" have been falling. You can compare your own keywords here.


The Porngram tool was built off the back of the Sexualitics dataset, scraped from Xhamster and Xvideos to become the subject of a research paper into porn data analysis. The Xvideos dataset (which looked at 1,200,000 videos) lacked the upload date, meaning that it wasn't useful for offering up trend data over time.

The research team -- made up of five individuals (Baptiste Coulmont, Antoine Mazières, Mathieu Trachman, Jean-Philippe Cointet and Christophe Prieur) with skills across computer science, sociology, statistics, mathematics and gender studies -- scraped the videos' titles, tags, description, viewcount, comments, runtime, upload date (if available) and uploader username using a custom-made crawler. These were then analysed using a quantitative approach in a bid to try and understand the classification of pornography and shed light, to a certain degree, on human sexuality (at least from the supply side).

Read next The web is drowning in Deepfakes and almost all of them are porn The web is drowning in Deepfakes and almost all of them are porn

Tags were sorted into categories (capturing variations of terms such as "blowjob", "bj" etc) and then these were ranked in terms of the frequency of occurrence -- how many videos have that particular tag. The most popular five percent of keywords (including amateur and blowjob) covered around 90 percent of videos and were therefore not particularly helpful in terms of categorising content.

The research -- outlined in this paper -- reveals not only the number of times a word occurs in the titles of porn movies over time, but the keywords that are most popular (based on views) and the ones that attract the most comments/reactions.


In order to eliminate categories that were "empty" in terms of descriptive power (those that were applied so frequently that they were meaningless), the team developed a nicheness score, an ad-hoc statistical model for ranking the descriptive power of categories.

This meant that the overused terms were given lower scores.

Having done this, the researchers started to draw semantic connections between keywords. For example the word "midget" is a low-frequency category in the Xhamster database, but is present ten times more than statistically expected in videos that also have the tag "funny". This indicates a strong relation between these two categories.

Read next Age checks on UK porn threaten independent pornographers Age checks on UK porn threaten independent pornographers

Antoine Mazières, a PhD candidate at INRA-SenS and LIAFA told Wired.co.uk that the biggest challenge was making sense of the data. "A bunch of skills were required spanning from statistical physics to cultural anthropology."

The most significant finding, in Mazières view, was how the data highlighted the "huge diversity of sexual practices" while "statistically relativising the overwhelmingness of mainstream categories".

The data also revealed some surprises, including the fact that 37 out of the 100 most viewed videos on Xhamster have "mom" or "mother" in the title. This category did not include the term MILF. "I was really not expecting that one," says Mazieres.

The research was limited by the fact that most of the data associated with porn videos is kept by the platforms themselves.


Mazières and colleagues would like to gain access to server side data, to analyse views of videos over time (rather than total views of videos), for example, or to observe the "career" of a porn consumer over time. "This could be done with anonymised data, of course, using unique ID instead of IP, for example," he told us. "This would be a Kinsey 2.0 project!"

Having a better understanding of the demand side of porn could be used to help inform the production of adult entertainment, much in the way that Netflix mines its users to inform the content that it commissions.

The team has made its dataset publicly available so that others can play around with it.