Artificial-intelligence programs could develop some much-needed common sense by competing in scavenger hunts inside virtual homes filled with simulated coffee tables, couches, lamps, and other everyday things.

Researchers at Facebook and Georgia Tech developed the scavenger-hunt challenge. The contest requires a virtual agent to look for something in a simulated home after parsing a natural-language question. An agent would be placed in room of a virtual home at random and asked something like “What color is the car?” or “Where is the coffee table?” Finding the answer requires an agent to understand the question and then explore the virtual space in search of the relevant object.

“The goal is to build intelligent systems that can see, talk, plan, and reason,” says Devi Parikh, a computer scientist at Georgia Tech and Facebook AI Research (FAIR), who developed the contest with her colleague and husband, Dhruv Batra.

Parikh, Batra, and their collaborators developed an agent that combines several different forms of machine learning to answer questions about a home. The agent also learns a rudimentary form of common sense by figuring out, through lots of trial and error, the best places to look for a particular object. For instance, over time, the agent learns that cars are usually found in the garage, and it understands that garages can usually be found by going out the front or back door.

The approach relies on reinforcement learning, a form of machine learning inspired by animal behavior, as well as imitation learning, a technique that lets algorithms learn by observation. The virtual homes were created by researchers at FAIR and UC Berkeley. The research was highlighted during Facebook’s annual developer conference today.

An agent navigating a virtual home. Embodied QA

A growing number of researchers are experimenting with virtual environments for training AI programs. The approach is seen as a way to broaden the intelligence of AI and overcome fundamental limitations. While there has been remarkable progress in AI lately, it has tended to involve computers doing a single task, like recognizing faces in images or playing a board game. What’s more, AI programs are generally trained on still images rather than in 3-D settings

As early AI research showed, it simply isn’t practical to hand-code such knowledge into a system (see “AI’s language problem”). So the solution will most likely be for AI programs to learn such knowledge for themselves.

Microsoft has released an environment called Malmo, which is based on the game Minecraft. Researchers at the Allen Institute for AI (Ai2) in Seattle developed another 3-D virtual environment for training AI agents. This environment also reflects basic physics, and it allows agents to take simple actions. The Ai2 researchers have proposed a similar set of natural-language challenges for agents in their environment.

Roozbeh Mottaghi, the lead researcher behind the Ai2 project, says it is crucial for these virtual environments to become more realistic if we want AI agents to learn properly inside them. Currently, this isn’t really practical. “Designing a single realistic-looking room might take months, and it is costly,” he says. “And defining realistic physical properties for every object is very challenging.”