The multiverse of science fiction is populated by robots that are indistinguishable from humans. They are usually smarter, faster, and stronger than us. They seem capable of doing any job imaginable, from piloting a starship and battling alien invaders to taking out the trash and cooking a gourmet meal.

The reality, of course, is far from fantasy. Aside from industrial settings, robots have yet to meet The Jetsons. The robots the public are exposed to seem little more than over-sized plastic toys, pre-programmed to perform a set of tasks without the ability to interact meaningfully with their environment or their creators.

To paraphrase PayPal co-founder and tech entrepreneur Peter Thiel, we wanted cool robots, instead we got 140 characters and Flippy the burger bot. But scientists are making progress to empower robots with the ability to see and respond to their surroundings just like humans.

Some of the latest developments in that arena were presented this month at the annual Robotics: Science and Systems Conference in Cambridge, Massachusetts. The papers drilled down into topics that ranged from how to make robots more conversational and help them understand language ambiguities to helping them see and navigate through complex spaces.

Improved Vision

Ben Burchfiel, a graduate student at Duke University, and his thesis advisor George Konidaris, an assistant professor of computer science at Brown University, developed an algorithm to enable machines to see the world more like humans.

In the paper, Burchfiel and Konidaris demonstrate how they can teach robots to identify and possibly manipulate three-dimensional objects even when they might be obscured or sitting in unfamiliar positions, such as a teapot that has been tipped over.

The researchers trained their algorithm by feeding it 3D scans of about 4,000 common household items such as beds, chairs, tables, and even toilets. They then tested its ability to identify about 900 new 3D objects just from a bird’s eye view. The algorithm made the right guess 75 percent of the time versus a success rate of about 50 percent for other computer vision techniques.

In an email interview with Singularity Hub, Burchfiel notes his research is not the first to train machines on 3D object classification. How their approach differs is that they confine the space in which the robot learns to classify the objects.

“Imagine the space of all possible objects,” Burchfiel explains. “That is to say, imagine you had tiny Legos, and I told you [that] you could stick them together any way you wanted, just build me an object. You have a huge number of objects you could make!”

The infinite possibilities could result in an object no human or machine might recognize.

To address that problem, the researchers had their algorithm find a more restricted space that would host the objects it wants to classify. “By working in this restricted space—mathematically we call it a subspace—we greatly simplify our task of classification. It is the finding of this space that sets us apart from previous approaches.”

Following Directions

Meanwhile, a pair of undergraduate students at Brown University figured out a way to teach robots to understand directions better, even at varying degrees of abstraction.

The research, led by Dilip Arumugam and Siddharth Karamcheti, addressed how to train a robot to understand nuances of natural language and then follow instructions correctly and efficiently.

“The problem is that commands can have different levels of abstraction, and that can cause a robot to plan its actions inefficiently or fail to complete the task at all,” says Arumugam in a press release.

In this project, the young researchers crowdsourced instructions for moving a virtual robot through an online domain. The space consisted of several rooms and a chair, which the robot was told to manipulate from one place to another. The volunteers gave various commands to the robot, ranging from general (“take the chair to the blue room”) to step-by-step instructions.

The researchers then used the database of spoken instructions to teach their system to understand the kinds of words used in different levels of language. The machine learned to not only follow instructions but to recognize the level of abstraction. That was key to kickstart its problem-solving abilities to tackle the job in the most appropriate way.

The research eventually moved from virtual pixels to a real place, using a Roomba-like robot that was able to respond to instructions within one second 90 percent of the time. Conversely, when unable to identify the specificity of the task, it took the robot 20 or more seconds to plan a task about 50 percent of the time.

One application of this new machine-learning technique referenced in the paper is a robot worker in a warehouse setting, but there are many fields that could benefit from a more versatile machine capable of moving seamlessly between small-scale operations and generalized tasks.

“Other areas that could possibly benefit from such a system include things from autonomous vehicles… to assistive robotics, all the way to medical robotics,” says Karamcheti, responding to a question by email from Singularity Hub.

More to Come

These achievements are yet another step toward creating robots that see, listen, and act more like humans. But don’t expect Disney to build a real-life Westworld next to Toon Town anytime soon.

“I think we’re a long way off from human-level communication,” Karamcheti says. “There are so many problems preventing our learning models from getting to that point, from seemingly simple questions like how to deal with words never seen before, to harder, more complicated questions like how to resolve the ambiguities inherent in language, including idiomatic or metaphorical speech.”

Even relatively verbose chatbots can run out of things to say, Karamcheti notes, as the conversation becomes more complex.

The same goes for human vision, according to Burchfiel.

While deep learning techniques have dramatically improved pattern matching—Google can find just about any picture of a cat—there’s more to human eyesight than, well, meets the eye.

“There are two big areas where I think perception has a long way to go: inductive bias and formal reasoning,” Burchfiel says.

The former is essentially all of the contextual knowledge people use to help them reason, he explains. Burchfiel uses the example of a puddle in the street. People are conditioned or biased to assume it’s a puddle of water rather than a patch of glass, for instance.

“This sort of bias is why we see faces in clouds; we have strong inductive bias helping us identify faces,” he says. “While it sounds simple at first, it powers much of what we do. Humans have a very intuitive understanding of what they expect to see, [and] it makes perception much easier.”

Formal reasoning is equally important. A machine can use deep learning, in Burchfiel’s example, to figure out the direction any river flows once it understands that water runs downhill. But it’s not yet capable of applying the sort of human reasoning that would allow us to transfer that knowledge to an alien setting, such as figuring out how water moves through a plumbing system on Mars.

“Much work was done in decades past on this sort of formal reasoning… but we have yet to figure out how to merge it with standard machine-learning methods to create a seamless system that is useful in the actual physical world.”

Robots still have a lot to learn about being human, which should make us feel good that we’re still by far the most complex machines on the planet.

Image Credit: Alex Knight via Unsplash