Video: Software can learn sign language by watching TV

It’s not only humans that can learn from watching television. Software developed in the UK has worked out the basics of sign language by absorbing TV shows that are both subtitled and signed.

While almost all shows are broadcast with subtitles, some are also accompanied with sign language because it is easier for many deaf people to follow.

Shows with both text and signing are a bit like a Rosetta Stone – a carving that provided the breakthrough in decoding Egyptian hieroglyphics from an adjacent translation in classical Greek.

So Patrick Buehler and Andrew Zisserman at the University of Oxford, along with Mark Everingham at the University of Leeds set out to see if software that can already interpret the typed word could learn British Sign Language from video footage.


Sign of the times

They first designed an algorithm to recognise the gestures made by the signer. The software uses the arms to work out the rough location of the fast-moving hands, and identifies flesh-coloured pixels in those areas to reveal precise hand shapes.

Once the team were confident the computer could identify different signs in this way, they exposed it to around 10 hours of TV footage that was both signed and subtitled. They tasked the software with learning the signs for a mixture of 210 nouns and adjectives that appeared multiple times during the footage.

The program did so by analysing the signs that accompanied each of those words whenever it appeared in the subtitles. Where it was not obvious which part of a signing sequence relates to the given keyword, the system compared multiple occurrences of a word to pinpoint the correct sign.

Starting without any knowledge of the signs for those 210 words, the software correctly learnt 136 of them, or 65 per cent, says Everingham. “Some words have different signs depending on the context – for example, cutting a tree has a different sign to cutting a rose.” he says, so this is a high success rate given the complexity of the task.

Signing avatars

Helen Cooper and Richard Bowden at the University of Surrey, UK, have used the same software in a different way to teach their own computer sign language.

“Our approach achieves higher accuracy levels with less data,” Bowden says. To get such good results, Cooper and he get the software to scan all the signs in a video sequence and identify those that appear frequently and so likely represent common words. The meaning of each of those signs is then determined by referring to the subtitles.

“That approach is very scalable – it can run quickly on large amounts of data,” says Everingham. But he thinks that it leaves the software less able to distinguish between terms than using his team’s more word-specific method.

Both approaches, though, could be more than just academic demonstrations of the power of software. They could be used to create a way to automatically animate digital avatars that could fluently sign alongside any TV programme. Previous attempts to do this resulted in avatars that appear clunky to people fluent in sign language, says Everingham.

Everingham and colleagues, and Cooper and Bowden, presented their work at the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009) in Miami Beach, Florida, last week.