When we hold a conversation, it’s not just our ears that are paying attention. We may not realize it, but our eyes are picking up on visual information as well to give us a better idea of what we should be hearing. It’s not necessary, of course, we can easily carry on a conversation in the dark, but it’s a form of redundancy that helps to make up for any aural lapses.

Our brains integrate information from both senses to compile a complete picture of what we should be hearing. For this to work, the information coming from both our eyes and ears has to line up, otherwise we’re left with a skewed version of what’s really going on. Our eyes hold significant sway over what we hear — for proof, we need only observe the McGurk effect in action.

Ba or Ga?

Named after cognitive psychologist Harry McGurk, who discovered it accidentally in 1976, the effect appears when we watch a person mouth one sound while a different sound is played. The most common example is a video of a man mouthing “ga” while the sound “ba” is played. For some reason, we hear the word “da” instead. Watch the video below, note what you hear. Then, try it with your eyes closed.

A key step in audio-visual understanding in the brain is causal inference: determining if what we see and hear comes from the same source. The McGurk effect likely occurs because our brains fail to recognize that two stimuli aren’t originating from the same source. Researchers from Baylor College of Medicine wanted to know why the brain fuses “ba” and “ga” into “da”, and why it doesn’t seem to work with other syllables. So they built two computer models designed to mimic the way our brains process audio and visual information.

Nate talks about the McGurk effect and other audio illusions in episode 5 of “It’s Only Science.”

They then ran their models—one that simulated causal inference and one that didn’t—through combinations of spoken and viewed syllables known to cause the McGurk effect, along with those that didn’t. The model guessed which sound it was supposed to hear based on three choices: the spoken sound, the mouthed sound, and one in between. They then compared the computer’s answers to those of actual humans.

Their study, published Thursday in PLOS Computational Biology, found that the version of the model without causal inference split the difference every time by choosing the amalgamation of audio and visual stimuli. With causal inference included, the model’s answers lined up well with those given by actual humans, indicating that a similar process likely takes place in our brains when we must choose between conflicting sources of information. The fact that certain incongruous syllable combinations aren’t fused together indicates that there’s some underlying mechanism deciding what types of audiovisual information should or shouldn’t be integrated.

Researchers still don’t completely understand how our brains link disparate events together, but knowing that causal inference is at play—and we can now model it—helps to clear up the mystery.

For now, it’s another reminder that we can’t always trust what we see and hear.