Do you get squeamish when someone dies on Game of Thrones? Or maybe you’re worried your favorite character is about to get killed, and you can’t bear to watch. Researchers at MIT have developed an algorithm that can predict what’s going to happen next in a video, giving you an opportunity to look away first.




The algorithm, developed at MIT’s Computer Science and Artificial Intelligence Laboratory, is slowly trying to learn a skill that humans spend their entire lives refining and perfecting. Through countless interactions and experiences with others, we’re able to accurately predict what will happen when two people meet or depart—be it a handshake, a hug, or a kiss.

To give an algorithm the same intuitions on human behavior as we all have, the researchers had it analyze countless hours of YouTube, as well as TV series like The Office and Desperate Housewives. When given a still frame from one of those videos, the algorithm searches for patterns and recognizable objects—hands, human faces, etc.—and attempts to predict their motion in order to conclude where they’ll end up, and what they might do there. For example, two faces getting closer frame-by-frame is a good indicator that two people are about to kiss.




How does it do? After analyzing some 600 hours of raw video (with no explainers or descriptors) the CSAIL algorithm could correctly predict the correct action (hugs, handshakes, high-fives, or kisses) about 43 percent of the time when shown a video that’s one second before it actually happening.

What does this mean for you? Nothing yet. The researchers hope that one day algorithms like this could help improve how robots interact with humans. But we’re going to hold out for automatic video captioning letting us know that when a sword is drawn on Game of Thrones, you’d better look away fast if you don’t want to see heads flying.

[MIT CSAIL]