A Thought Experiment

Imagine that you wake up in a strange room. It’s not the nice bedroom you went to sleep in, but a dimly lit cell with a damp, cold floor. The walls are made out of cracking plaster and the only intended way in or out seems to be an imposing steel door that is padlocked from the inside. High above on one wall is a barred window that lets in the only light. If, after looking around, you come to the conclusion that you are trapped, that wouldn’t be unreasonable. Things do look dire.

Would that satisfy you, though? Probably not. You would want to explore the room a bit more, maybe give that padlock a tug to see how secure it really is. Or maybe you would test the strength of those cracking plaster walls. Perhaps a few well-targeted blows to those well-worn walls would create a hole you can slip through? And maybe, just maybe, those bars on the window are set wide enough apart that you could wiggle between them to freedom? Interacting with the environment gives you much more information than just passively observing it. Seeing might be believing, but to actually justify those beliefs, you need to interact with your surroundings.

Concept of a Concept

Containment is a concept. Dog is a concept too. So is running, or forests, or beauty, or green, or death. Concepts are abstractions that we derive from everyday interactions with the world. They form the reusable building blocks of knowledge that are essential to humans for making sense of the world.

When we have a conceptual understanding of something, we have in a way a mastery of that thing. In the case of containment, this mastery means that we can identify containers in the world, tell them apart from non-containers, put things into them, take things out of them, and anticipate what will happen when we interact with them. We can even begin to look at novel objects and see in them the potential to contain or be contained.

Common approaches to conceptual understanding in AI, including deep learning systems trained on datasets like ImageNet [1], appear to capture some of these abilities, but they lack the mastery that comes from interactions. Given an image or even a video, such approaches might be able to tell whether there is a specific kind of container in it—say, a cup, or a house, or a bottle—and locate where in the image the container is. But they would likely fail in spectacular ways when encountering a previously unseen type of container. Asking such a system to contain itself would be met only with confusion, since it associates the container concept with a collection of visual features, but lacks an active understanding of containment.

Concepts from Sensorimotor Contingencies

Henri Poincaré was among the first to emphasize the role of sensorimotor representations in human understanding. In Science and Hypothesis [2] he argued that a motionless being would never acquire the concept of 3D space. Recently, several cognitive scientists have proposed that conceptual representations arise from the integration of perception and action. To pick just one work, O’Regan and Noë [3] define sensorimotor contingencies as “the structure of the rules governing the sensory changes produced by various motor actions,” viewing vision as a “mode of exploration of the world that is mediated by knowledge of what we call sensorimotor contingencies.” Noë [4] goes on to elaborate that “(concepts) are themselves techniques or means for handling what there is.”

While the importance of sensorimotor contingencies have been appreciated in the cognitive science community, those ideas have resulted in only a few concrete computational models that explore their role in concept formation. In a paper that we presented recently at AAAI-18, we introduced a computational model that learns concepts by interacting with the environment.

What We Did

We set out to represent and learn two essential abilities that make up conceptual understanding: the ability to actively detect the presence of a concept, and the ability to actively bring about a concept. Further, we wanted to investigate the situations in which interactive abilities are preferable to passive approaches, and understand how the reuse of abilities learned for simple concepts might help with learning more complex concepts.

We started by developing a training ground for learning active concepts, an environment we call PixelWorld (available on github). In PixelWorld, things are a bit simpler than they are in the real world. It is a discrete 2D grid environment inhabited by a pixel agent and one or more objects of different kinds, all composed of a few pixels (e.g., lines, blobs, and containers).

The agent has a simple embodiment: it perceives only a 3×3 window around itself, and can choose to move up, down, left, right, or stop and signal a bit of information. This embodiment requires it to learn even the most basic representations, such as the notion of an object, as interactive concepts. While this might seem like unnecessary sensory deprivation, eliminating a sophisticated visual perception system allows us to highlight the role of composing heterogeneous behaviors into meaningful concept representations.

We trained agents for two different kinds of tasks. The first task was to explore the environment and signal whether the concept was present, e.g., whether the agent was contained. It was rewarded if it got the answer right. The second task was to bring about the concept, e.g., making itself contained. It was rewarded if it brought about the concept and correctly signaled that it had. We used reinforcement learning to train agents to solve these tasks.

For example, we trained an agent to detect whether it was (horizontally) contained. The animation below illustrates its behavior: it checks whether there is a wall on the right, then checks whether there is a wall on the left. Since both tests succeed, it signals that it is contained.