In his excellent book Thinking, Fast and Slow, Daniel Kahneman describes humans as having two thought systems, System 1 and System 2. Conscious thought is performed by System 2, but it is only a fraction of our cognition. Most of our thinking is done below conscious awareness by System 1. This subconscious thought encompasses our fast gut reactions and our ability to recognize patterns. The combination of System 1 and System 2 endows us with an intelligence that is broad and flexible, and researchers in artificial intelligence have been trying for decades to build computers that match this capability. Current computers do a decent job at logical reasoning as embodied by System 2, but they have been hampered by a lack of a robust System 1.

Consider how toddlers can recognize abstractly drawn animals in children’s books. A cow in one of these books doesn’t look like a real cow at all, yet children can pick up on one or two “cow features” and identify the animal as such. Our computers can make logical deductions, and can even play chess better than any human, but their lack of a powerful System 1 prevents them from achieving the supple intelligence necessary to reliably recognize cows or to follow conversations that even a child could understand.

This is beginning to change. A more powerful System 1 for our computers is emerging from a machine learning method called deep learning. Deep learning systems are artificial neural networks with many layers. Neural networks are rough mathematical approximations of how the human brain processes information. They have been around since the 1940s, but it has been only in the last eight years that Geoffrey Hinton and others have figured out how to train networks with lots of layers. The large number of layers is what allows the networks to represent the complicated features necessary for System 1 thinking. These layers are analogous to the human visual system where objects are represented as compositions of individual features. (See figure below from Lee et al. [2009].)

Deep learning allows computers to generalize and to see the world at the right level of detail

Flexible intelligence requires the ability to learn new representations. We humans update how we see the world all the time. Once we notice a new kind of tree or car, our representation of the world has changed, and the new object seems to appear everywhere. Deep learning brings us a step closer to this capability. Computers have traditionally represented objects in the world using symbols, and two symbols can only be equal or not equal. Deep learning represents the environment using vectors. Vectors are sequences of numbers, such as [14.2, 17.1, 2.4]. Instead of just being equal or not equal, we can measure distance between one vector and another, and this measure of distance provides a powerful mechanism for generalization. When a deep learning algorithm reads natural language text, such as a newspaper article, instead of representing the word “worried” as one symbol and the word “anxious” as another symbol, it represents each by a vector. Computing the distance between vectors allows the computer to know that the meaning of the sentence “I feel worried” is closer to the meaning of the sentence “I feel anxious” than to the sentence “I feel sleepy.”

Deep learning algorithms acquire their vector representations through experience with the world. A computer can be given a task, such as predicting the next word in a document, and it can continually update its representation until it finds a representation that allows it to reliably do the task. Learning through experience allows computers to usefully process the world without knowing everything, analogously to how children are able to climb trees without knowing their molecular structure.

Deep learning still isn’t deep enough

Generalizable representations learned through experience are the beginnings of an improved System 1. Previous systems represented the world using symbols that lacked intrinsic meaning. These symbols allowed computers to loop and execute “if-then” statements, but if the programmers neglected to code for a particular case, the computer was stuck. By moving from symbols to vectors grounded in experience with the world, computers can generalize to situations they haven’t encountered before and keep those generalizations at a useful level of detail. Effective generalization is crucial because the world is too vast and complicated to be written down in code.

While deep learning can now represent objects in images and words in sentences, there are even more fundamental conceptual structures that are needed for a complete System 1. Consider what it means for an object to contain another object, such as for a box to contain a ball. The container constrains the movement of the object inside, and if the container is moved, the contained object moves as well. Fundamental representations such as container are called image schemas, and they map experience to conceptual structure. Developmental psychologist Jean Mandler argues that some image schemas are formed before children begin to talk, and that language is eventually built onto this base set of schemas. We saw the example of containment, which consists of a boundary plus an inside and an outside. Some other image schemas from Mark Johnson’s book, The Body in the Mind, include path, counterforce, restraint, removal, enablement, attraction, link, cycle, near-far, scale, part-whole, full-empty, matching, surface, object, and collection.

Image schemas are needed for understanding concepts that are so fundamental that no one bothers to write them down. As Marvin Minsky asks, how could a computer learn that you can pull a box with a string but not push it? Image schemas even allow us to comprehend abstract ideas through metaphors, according to George Lakoff and Mark Johnson. For example, a container can be more than a way of understanding physical constraints, it can be a metaphor used to understand the abstract concept of what an argument is. You can say that someone’s argument doesn’t hold water, or you can say that it is empty, or you can say that the argument has holes in it. These metaphors to direct physical experience allow us, and maybe someday our computers, to understand our most profound concepts.

Conclusion

Artificial intelligence research has been working backward from the high-level logical thinking characterized by System 2 to the low-level fundamental thinking performed by System 1. When researchers began over half a century ago, they tried to get computers to mimic some of our highest cognitive functions, such as logical reasoning. They considered conscious reasoning to be the pinnacle of intelligence, but they didn’t realize that the hardest things for a computer to do are the things that a toddler can do effortlessly.

Deep learning advances the state of the art in pattern recognition and natural language processing, but what is most significant about deep learning is that it acquires generalizable representations grounded in experience. The next frontier will be building machines that can represent the deepest conceptual structures of our minds, such as what a container is, and can use that ability to understand abstract concepts through metaphor. If we can make it all the way down so that computers have a grounded understanding of the most fundamental concepts, we will have built an intelligence that is as flexible as our own.

While computer intelligence may someday be as capable as ours, it will probably be alien. A consequence of learning through experience is that the representations learned by computers will have no meaning to us if their experiences are different from our own. We humans have shared experience, which allows us to all see the world in roughly the same way. If someone mentions that a ball hit a fence, we know what that person means because we all have the same understanding of what constitutes a ball, a fence, and what hitting means. The shape and size of our bodies also influences our experience, allowing us humans to see the world in the same way because our bodies are roughly the same shape and size. Ants probably don’t recognize fences as being different from any other kind of vertical surface, and to the degree that computers have different experiences and different bodies from us, they will see the world in an alien way. Alien or not, what deep learning really means is that we are close to living with artificial intelligence.

Image credit: Lee, Honglak, Roger Grosse, Rajesh Ranganath, and Andrew Y. Ng. "Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations." In Proceedings of the 26th Annual International Conference on Machine Learning, pp. 609-616. ACM, 2009.