Watch where you look – it can be used to predict what you’ll say. A new study shows that it is possible to guess what sentences people will use to describe a scene by tracking their eye movements.

Moreno Coco and Frank Keller at the University of Edinburgh, UK, presented 24 volunteers with a series of photo-realistic images depicting indoor scenes such as a hotel reception. They then tracked the sequence of objects that each volunteer looked at after being asked to describe what they saw.

Other than being prompted with a keyword, such as “man” or “suitcase”, participants were free to describe the scene however they liked. Some typical sentences included “the man is standing in the reception of a hotel” or “the suitcase is on the floor”.

The order in which a participant’s gaze settled on objects in each scene tended to mirror the order of nouns in the sentence used to describe it. “We were surprised there was such a close correlation,” says Keller. Given that multiple cognitive processes are involved in sentence formation, Coco says “it is remarkable to find evidence of similarity between speech and visual attention”.


Word prediction

The team used the discovery to see if they could predict what sentences would be used to describe a scene based on eye movement alone. They developed an algorithm that was able to use the eye gazes recorded from the previous experiment to predict the correct sentence from a choice of 576 descriptions.

Changsong Liu of Michigan State University’s Language and Interaction Research lab, in East Lansing, who was not involved in the study, suggests these results could motivate novel designs for human-machine interfaces that take advantage of visual cues to improve speech recognition software.

Gaze information is already used to help with disambiguation. For example, if a speech recognition system can tell that you are looking at a tree, it is less likely to guess that you just said “three”. Sentence prediction, perhaps in combination with augmented reality headsets that track eye movement, for example, is one possible application.

Coco and Keller are now looking into the role of coordinated visual and linguistic processes in conversations between two people. “People engaged in a dialogue use similar syntactic forms, expressions and eye-movements,” says Coco. One hypothesis is that such “coordinative mimicry” might be important for joint decision-making.

Cognitive Science, DOI: 10.1111/j.1551-6709.2012.01246.x