Did the agents engage in any unintended behavior, like the kind that emerges from the three laws in Asimov’s fiction?

We initially got good behavior. For example, the virtual robot takes out enemies that are trying to kill you. Once in a while it might jump in front of a bullet for you, if this is the only way to save you. But one thing that was a bit surprising to us, at the beginning, was that it was also very afraid of you.

The reason for this has to do with its “local forward” model: Basically, it looks at how certain action sequences two or three steps into the future affect the world, for both you and itself. So as a first, easy step, we programmed this model to assume that the player would act randomly. But in practice, that meant that the agent was essentially acting under the assumption that the human player is kind of a psychopath, and so at any point in time that human could decide to, for example, fire at the agent. So the agent would always be very, very careful to be in positions where the human couldn’t kill it.

We had to fix this, so we modeled something we call a trust assumption. Basically, the companion agent acts under the assumption that the human will only choose those actions that will not remove the agent’s own empowerment — which is probably a more natural model for a companion anyway.

The other thing we noticed in the game was that, if you had, say, 10 health points, the companion wasn’t really concerned with you losing the first eight or nine of these—and would even shoot you once in a while just for laughs. There, again, we realized that there’s a disconnect between the world we live in and the model in a computer game. Once we modeled a limitation of ability resulting from health loss, this problem went away. But it also could have been dealt with by designing the local-forward model in a way that makes it able to look further into the future than just a few steps. If the agent were able to look really far into the future, it would see that having more health points might be helpful for the things to come.

Whereas if the loss of spare health points doesn’t make a difference to my empowerment right now …

The agent basically goes, “Oh, I could not shoot him, or I could shoot him. No difference.” And sometimes it shoots you. Which of course is a problem. I do not condone the random shooting of players. We’ve added a fix so the virtual robot cares a bit more about your empowerment than about its own.

How do you make these concepts precise?

If you think about agents as control systems, you can think in terms of information: Stuff happens in the world, and this somehow affects you. We’re not just talking about information in terms of things you perceive, but as any kind of influence—it could be matter, anything flowing back and forth between the world and you. It might be the temperature affecting you, or nutrients entering your body. Any kind of thing that permeates this boundary between the world and the agent carries information in. And in the same way, the agent can affect the outside world in numerous ways, which also outputs information.

You can look at this flow as a channel capacity, which is a concept from information theory. You have high empowerment if you have different actions you can take that will lead to different results. If any of these capabilities become worse, then your empowerment goes down—because the loss of capability corresponds with a quantifiable reduction in this channel capacity between you and the environment. This is the core idea.