It’s easy to take for granted the complex mental tasks human beings are constantly performing.

When we watch a baseball game, we can easily distinguish the pitcher from the mound he stands on, describe how he winds up before he flings the ball towards the plate, and even predict if the next pitch will be a hanging curveball or 100-mph fastball.

There isn't yet a machine that can comprehend such tasks that are simple to us.

IBM and the Massachusetts Institute of Technology in Cambridge hope to change that. The two organizations announced a partnership Tuesday for machines to see, hear, and interpret like humans do. The IBM-MIT Laboratory for Brain-inspired Multimedia Machine Comprehension, BM3C, for short, is a multi-year collaboration, said IBM in a news release.

The “brain-inspired” laboratory is just one of a number of partnerships IBM is pursuing to advance artificial intelligence (AI), as the scientific community makes headway in improving machines’ abilities to think as human beings do, and, in some cases, even outperform humans. In June, researchers announced they have developed a way for a machine to predict whether two humans will greet each other with a handshake or a hug. And last year, computer scientists created a machine that’s better at creating predictive algorithms than two-thirds of its human competitors. But being able to teach a machine to see and hear like humans has so far been out of reach.

The problem is human command of sights and sounds spans multiple cognitive disciplines, explains TechCrunch’s Devin Coldewey:

Say your camera is good enough to track objects minutely – what good is it if you don’t know how to separate objects from their background? Say you can do that – what good is it if you can’t identify the objects? Then you need to establish relationships between them, intuit physical rules … all stuff our brains are especially good at.

Because the human mind has already mastered this skill, researchers plan to model machine vision on virtual neural networks based on the real thing.

They expect that a machine-vision system could have big applications for industries such as healthcare, education, and entertainment.

Other MIT researchers have already made advances in computer prediction and comprehension. Researchers at the Computer Science and Artificial Intelligence Laboratory (CSAIL) taught a machine to predict how humans would greet each other. After showing the system 600 hours of raw footage from YouTube videos and television shows, it correctly guessed how people would greet each other 43 percent of the time, as Eva Botwin-Kowacki wrote for The Christian Science Monitor in June. In the same experiment, humans guessed right 71 percent of the time.

Two other researchers at MIT also created a Data Science Machine that can find patterns and select which data points are relevant, all without the problem-solving help of humans, wrote Kelsey Warner for the Monitor in October 2015.

Get the Monitor Stories you care about delivered to your inbox. By signing up, you agree to our Privacy Policy

“But the win-loss record was not the most impressive takeaway from the competitions,” writes Ms. Warner. “While teams of humans sweat their predictive algorithms for months leading up to competition, the Data Science Machine took somewhere between two and 12 hours to produce each of its entries.”

The scientific community hasn’t yet combined machine efficiency with human understanding. But IBM is creating a network of university research collaboration with the aim of achieving this goal, said the press release.