By Gary Marcus and Ernest Davis

Artificial Intelligence has seen radical advances of many kinds over the last years, roundly beating human champions in games like Go and poker that once seemed out of reach. Advances in other domains like speech recognition, machine translation, and photo tagging has become routine. Yet something foundational is still missing: ordinary common sense.

Common sense is knowledge that is commonly held, the sort of basic knowledge that we expect ordinary people to possess, like “People don’t like losing their money,” “You can keep money in your wallet,” “You can keep your wallet in your pocket,” “Knives cut things,” and “Objects don’t disappear when you cover them with a blanket.” Without it, the everyday world is hard to understand; lacking it, machines can’t understand novels, news articles, or movies.

The great irony of common sense—and indeed AI itself—is that it is stuff that pretty much everybody knows, yet nobody seems to know what exactly it is or how to build machines that possess it.

§

People have worried about the problem since the beginning of AI. John McCarthy, the very person who coined the name “artificial intelligence,” first started calling attention to it in 1959. But there has been remarkably little progress. Neither classical AI nor deep learning has made much headway. Deep learning, which lacks a direct way of incorporating abstract knowledge (like “People want to recover things they’ve lost”) has largely ignored the problem; classical AI has tried harder, pursuing a number of approaches, but none has been particularly successful.

One approach has been to try to learn everyday knowledge by crawling (or “scraping”) the web. One of the most extensive efforts, launched in 2011, is called NELL (short for Never-Ending Language Learner), led by Tom Mitchell, a professor at Carnegie Mellon and one of the pioneers in machine learning. Day after day—the project is still ongoing—NELL finds documents on the web and reads them, looking for particular linguistic patterns and making guesses about what they might mean. If it sees a phrase like “cities such as New York, Paris, and Berlin,” NELL infers that New York, Paris, and Berlin are all cities, and adds that to its database. If it sees the phrase “New York Jets quarterback Kellen Clemens,” it might infer the facts that Kellen Clemens plays for the New York Jets (in the present tense—NELL has pretty much no sense of time) and that Kellen Clemens is a quarterback.

As reasonable as the basic idea is, the results have been less than stellar; as an example, here are ten facts that NELL had recently learned: (You can test this out for yourself.)

dorset_hotel is a place to ski

business_or is an academic field

peter_measroch_telemea_cheese is a cheese

gathering_engineering is an area of study within the field of machine learning

n11_26_36 is a term used by physicists

bey and jay_z are family members

starbucks coffee company serves coffee

jack_cassidy has spouse jones

n7_34_am is the time of the event n2008_lake_kivu_earthquake

boaz is the parent of obed

Some are true, some are false, some are meaningless; few are particularly useful. They aren’t going to help robots manage in a kitchen, and although they might be of modest help in machine reading, they are too disjointed and spotty to solve the challenges of common sense.

Another approach to collecting commonsense knowledge, particularly trendy nowadays, is to use “crowdsourcing,” which basically means asking ordinary humans for help. Perhaps the most notable project is ConceptNet, which has been ongoing at the MIT Media Lab since 1999. The project maintains a website where volunteers can enter simple commonsense facts in English. For instance, a participant might be asked to provide facts that would be relevant in understanding the story, “Bob had a cold. Bob went to the doctor,” and might answer with facts such as “People with colds sneeze” and “You can help a sick person with medicine.” (The English sentences are then automatically converted to machine encodings through a process of pattern matching.)

Here, too, the idea seems reasonable on its face, but the results have been disappointing. One problem is that if you simply ask untrained lay people to enumerate facts, they tend to list easily found factoids like “A platypus is a mammal that lays eggs” or “Taps is a bugle call played at dusk”—rather than what computers really need: information that is obvious to humans but hard to find on the web, such as “After something is dead, it will never be alive again” or “A container with an opening on top and nowhere else will hold liquid.”

A second problem is that, even when lay people can be induced to give the right kind of information, it’s tough to get them to formulate it in the kind of finicky, hyper-precise way that computers require. Here, for example, is some of what ConceptNet has learned from lay people about restaurants.

To the untrained eye it seems perfectly fine. Each individual link (for example, the arrow in the top left that tells us that an oven is used for cooking) seems plausible in itself. A person can be at a location of a restaurant, and almost every person we have ever met desires survival; nobody would question the fact that we need to eat to survive.

But dive into the details and it’s a mess.

Take, for example, the link that says that “person” is at location “restaurant.” As Ernie’s mentor Drew McDermott pointed out long ago, in a rightly famous article called “Artificial Intelligence Meets Natural Stupidity,” the meaning of this sort of link is actually unclear.[v] At any given moment, somebody in the world is at a restaurant, but many people are not. Does the link mean that if you are looking for a particular person (your mother, say) you can always find her at a restaurant? Or that at some particular restaurant (Katz’s Delicatessen, say), you will always be able to find a person, 24/7? Or that any person you might care to find can always be found in a restaurant, the way that whales can always be found in the ocean? Another link tells us that a “cake UsedFor satisfy hunger.” Maybe so, but beware the link that says “cook UsedFor satisfy hunger” in conjunction with “cook IsA person,” which suggests that a cook might not just make a meal but become one. We’re not saying that crowdsourcing couldn’t ever be useful, but efforts to date have often yielded information that is confusing, incomplete, or even downright wrong.

A more recent project, also based at MIT, though run by a different team, is called VirtualHome. This project too has used crowdsourcing to collect information about procedures for simple activities like putting the groceries in the fridge and setting the table. They collected a total of 2,800 procedures for 500 tasks, involving 300 objects and 2,700 types of interactions. The basic actions were hooked into a game engine, so you can (sometimes) see an animation of the procedure in action. Once again, the results leave something to be desired. Consider for instance the crowdsourced procedure for “Exercise”:

walk to LIVING ROOM

find REMOTE CONTROL

grab REMOTE CONTROL

find TELEVISION

switch on TELEVISION

put back REMOTE CONTROL

find FLOOR

lie in on FLOOR

look at TELEVISION

find ARMS_BOTH

stretch ARMS_BOTH

find LEGS_BOTH

stretch LEGS_BOTH

stand up

jump

All of that may happen in some people’s exercise routine, but not in others. Some people may go to the gym, or run outside; some may jump, others may lift weights. Some steps might be skipped, others are missing; either way, a pair of stretches isn’t much of a workout. Meanwhile, finding the remote control is not really an essential part of exercising; and who on earth needs to “find” their arms or legs, or the floor? Something’s clearly gone wrong.

§

Solving this problem is, we would argue, the single most important step towards taking AI to the next level. Common sense is a critical component to building AIs that can understand what they read; that can control robots that can operate usefully and safely in the human environment; that can interact with human users in reasonable ways. Common sense is not just the hardest problem for AI; in the long run, it’s also the most important problem.

It’s also, we suspect, an interdisciplinary problem. Of late, artificial intelligence has focused largely on the mathematics of extracting complex statistics of large amount of data, but so far that approach has yielded little insight into the challenges of common sense. Instead, achieving common sense is likely to require complementing current approaches with insights from other fields, such as philosophy and cognitive psychology, drawing, for example, on Immanuel Kant’s analysis of the centrality of time, space, and causality in perceiving the world, and on Renée Baillargeon and Elizabeth Spelke’s studies on how common sense develops in human infants.

And we will almost certainly need new “hybrid” AI architectures that combine the strengths of modern machine learning, at home with vast amounts of data, with the strengths of often-derided “good old fashioned AI” – an approach that focused on the representation of knowledge in machine. Because common sense, is after all, nothing more than everyday knowledge represented in humans. If we can get machines to represent what people can represent, every day, as part of daily life, we will be golden.

Gary Marcus and Ernest Davis are the co-authors of "Rebooting AI: Building Artificial Intelligence We Can Trust," from which this article has been adapted.



