In computer science and control theory, we can create models that predict the next state given a current state and an action. These models, often neural networks, predict the immediate effect of actions on the future based on experience. You know what will happen by throwing a ball in the air with a certain strength because you have experienced it before. Thus, a neural network can learn to predict what will happen next to, let’s say, a robot arm, by learning from previous experiences of observations, actions, and new observations. This is a predictive forward model in a nutshell.

Your brain is, in a sense, constantly predicting the near future, based on the immediate past. As some neuroscience studies suggest, the brain is a predictive machine. Thus, you are surprised if things don’t go as expected. You probably make the same route to work almost everyday and all these memories just fade into each other. But what if one day you saw a vehicle catching fire in the middle of the road? You would surely remember it for years after, possibly remembering exactly even the date. Similarly, this is why you read a book, an article, see a movie, travel somewhere: to see and learn something that you didn’t expect or already know. This driving impulse, now that we know what a forward model is, can be reproduced in machines: an artificial intelligence agent can reward itself by doing actions that lead it to surprising states. Computationally, the surprise is the difference between the expected future and the future that actually happened, from a state, doing a certain action.

We can now introduce the environment where we will do our experiments: a simulated Fetch Robot, that can push around a box with its arm. This is the FetchPush-v1 environment in the famous OpenAI’s gym library. The goal of the agent is to push the box to its target position, the red ball. But we don’t care about this task now: as explained before, we are interested in intrinsic rewards, and we want to see what happens by guiding the agent with curiosity.

Our robot, Fetch, can move its end-effector around in three dimension. At every step, it can observe its position in space, and also the position of the cube. We can create an internal predictive forward model in the agent by making it experience its environment, by simply moving the arm around. Based on this experience, the agent will quickly learn what happens when it gives its arm a command: the arm moves a little in the specified direction. Thus the forward model will quickly become very good in predicting the movement of the arm. But a second thing can happen when giving an action command to the arm: the arm can touch the cube and move it. While this is intuitive for us, it is not for the robot, that is exploring the world for the first time like an infant. Learning to predict what happens to the cube when it is touched is quite harder, both for the complex physics of contact forces, and for the fact that, in its initial exploration, the robot will touch the cube just a few times, since the whole operational space is quite large and the cube is very small. So, the robot will experience that the cube is unaffected by the movement of the arm 99% of the times, and probably thinks that the cube is forever still. Thus, the predictive forward model will have an hard time predicting the movement of the cube. And here’s where we can see the effects of curiosity on the robot.

Two of the Fetch gym environments. We will focus on FetchPush (left).

Now that the robot has learned an initial forward model, we can make it explore the environment more, moved by curiosity. The robot will then try to find actions for which the outcome is surprising. As anticipated, moving the arm around is generally boring for the robot, since it knows well what will happen. But, little by little, it will learn that it finds surprising what happens when it touches the cube. The cube moves in an unpredicted way, and this is cause of surprise for it, tickling its curiosity. And so, just as a baby, the robot will learn to play with the cube because it is, in some sense, fun. Interestingly, it discovers that the most unpredictable thing happen when it pushes the box out of the table, and it falls on the ground.

Fetch learns to play with the cube guided by curiosity.

The robot has learned to play with the cube without any external guidance or signal. It doesn’t know about the goal of the task, it’s just trying to explore the world around it and trying to find surprising things because it’s curious. And this simple intrinsic motivation has made it discover the cube, that like a toy is now its main interest. As described in this blog post, curiosity can greatly help an agent in exploring its surroundings, finding things that would otherwise remain unseen. With curiosity, an agent can learn to activate rare events in an environments, and little by little learn about all the underlying mechanics of a complex system.

In the recent years, researchers in the field of AI have studied the effects of curiosity in agents in a wide range of environments. One of the most interesting results came from applying curiosity to agents playing videogames. A recent study has demonstrated how, if guided by curiosity, an agent can learn to play several levels of Super Mario Bros. without any external reward. It has no interest in breaking records, just a strong interest in discovering what happens next, and a desire to find new and unexpected things. In this game, the best way to do this is by proceeding in a level and discovering the next areas. And to do this, the agent has to learn how to survive, avoiding enemies and traps, just to discover new things.

Another study showed how this actually happens in several Atari videogames: an agent can learn how to play quite well a game just by following this intrinsic reward. But what’s really interesting is a conclusion that the researchers wrote in the paper: this result is not only an achievement of AI, but also a good insight in the effect of curiosity in humans as well. We play videogames because they’re fun, and they are a source of new stimuli, experiences and challenges. Thus, a well designed videogame should be created around rewarding the curiosity of the player, and this is why an AI agent driven by curiosity can learn to play those games.

Curiosity is an essential part of human intelligence. It doesn’t only characterize the human behavior, but it’s also an essential tool in building further intelligence and knowledge: without curiosity we cannot discover new things unless they bump into us. This is why, to build truly intelligent machines, it is fundamental to characterize and model curiosity and the other intrinsic stimuli that are generated by our brains, and have driven mankind in its constant evolution.

Feel curious? You can find all the code in this GitHub repository.

Thank you for reading this far! You can follow me on Twitter (@normandipalo) to follow my work and research.