[Caption censored] (Image: Yuri Arcurs Media/SuperFusion/SuperStock)

It doesn’t take much imagination to guess what a porn video sounds like. It’s more impressive, however, when it’s a computer that’s doing the guessing.

Automatic image-analysis systems are already used to catch unwanted pornography before it reaches a computer monitor. But they often struggle to distinguish between indecent imagery and more innocuous pictures with large flesh-coloured regions, such as a person in swimwear or a close-up face. Analysing the audio for a “sexual scream or moan” could solve the problem, say electrical engineers MyungJong Kim and Hoirin Kim at the Korea Advanced Institute of Science and Technology in Daejeon, South Korea.

The pair used a signal-processing technique called the Radon transform to create spectrograms of a variety of audio clips, each just half a second long. They found that speech signals are normally low-pitched and musical clips have a wide range of pitches; both vary only gradually over time. In contrast, pornographic sounds tend to be higher-pitched, change quickly and also periodically repeat.


These characteristics allow software to distinguish smutty audio from other content. The researchers used a statistical model to classify sounds as pornographic or non-pornographic according to their spectral characteristics, and tested it on audio taken from online videos. The non-sexual audio clips included music, movies, news and sport.

The model outperformed other audio-based techniques, correctly identifying 93 per cent of the pornographic content from the test clips. The clips it missed had confusable sound, such as background music, causing the model to misclassify some lewd clips. Comedy shows with laughter were also sometimes mistaken for pornography, as the loud audience cheers and cries share similar spectral characteristics to sexual sounds.

Yes, yes, oh yes

“It’s quite ingenious,” says Richard Harvey, a computer scientist at the University of East Anglia in Norwich, UK, who previously worked on image-based pornography detection. But image-based methods are no less accurate, he says, and only require a single frame whereas the performance of the audio method needs to analyse longer clips.

He suggests it might be better to combine both methods to weed out unusual cases: “Think of that scene in When Harry Met Sally [in which a female character fakes an orgasm while fully clothed in a diner] – the audio is very clearly pointing in one direction, but the video is not.”

The researchers will present the work at the International Workshop on Content-Based Multimedia Indexing in Madrid, Spain, next month.