Wired wrote a good article about Karl Friston, the neuroscientist whose works I’ve puzzled over here before. Raviv writes:

Friston’s free energy principle says that all life…is driven by the same universal imperative…to act in ways that reduce the gulf between your expectations and your sensory inputs. Or, in Fristonian terms, it is to minimize free energy.

Put this way, it’s clearly just perceptual control theory. Powers describes the same insight like this:

[Action] is the difference between some condition of the situation as the subject sees it, and what we might call a reference condition, as he understands it.

I’d previously noticed that these theories had some weird similarities. But I want to go further and say they’re fundamentally the same paradigm. I don’t want to deny that the two theories have developed differently, and I especially don’t want to deny that free energy/predictive coding has done great work building in a lot of Bayesian math that perceptual control theory can’t match. But the foundations are the same.

Why is this of more than historical interest? Because some people (often including me) find free energy/predictive coding very difficult to understand, but find perceptual control theory intuitive. If these are basically the same, then someone who wants to understand free energy can learn perceptual control theory and then a glossary of which concepts match to each other, and save themselves the grief of trying to learn free energy/predictive coding just by reading Friston directly.

So here is my glossary:

FE/PC: prediction, expectation

PCT: set point, reference level

And…

FE/PC: prediction error, free energy

PCT: deviation from set point

So for example, suppose it’s freezing cold out, and this makes you unhappy, and so you try to go inside to get warm. FE/PC would describe this as “You naturally predict that you will be a comfortable temperature, so the cold registers as strong prediction error, so in order to minimize prediction error you go inside and get warm.” PCT would say “Your temperature set point is fixed at ‘comfortable’, the cold marks a wide deviation from your temperature set point, so in order to get closer to your set point, you go inside”.

The PCT version makes more sense to me here because the phrase “you naturally predict that you will be a comfortable temperature” doesn’t match any reasonable meaning of “predict”. If I go outside in Antarctica, I am definitely predicting I will be uncomfortably cold. FE/PC obviously means to distinguish between a sort of unconscious neural-level “prediction” and a conscious rational one, but these kinds of vocabulary choices are why it’s so hard to understand. PCT uses the much more intuitive term “set point” and makes the whole situation clearer.

FE/PC: surprise

PCT: deviation from set point

FE/PC says that “the fundamental drive behind all behavior is to minimize surprise”. This leads to questions like “What if I feel like one of my drives is hunger?” and answers like “Well, you must be predicting you would eat 2000 calories per day, so when you don’t eat that much, you’re surprised, and in order to avoid that surprise, you feel like you should eat.”

PCT frames the same issue as “You have a set point saying how many calories you should eat each day. Right now it’s set at 2000. If you don’t eat all day, you’re below your calorie set point, that registers as bad, and so you try to eat in order to minimize that deviation.”

And suppose we give you olanzapine, a drug known for making people ravenously hungry. The FE/PCist would say “Olanzapine has made you predict you will eat more, which makes you even more surprised that you haven’t eaten”. The PCTist would say “Olanzapine has raised your calorie set point, which means not eating is an even bigger deviation.”

Again, they’re the same system, but the PCT vocabulary sounds sensible whereas the FE/PC vocabulary is confusing.

FE/PC: Active inference

PCT: Behavior as control of perception

FE/PC talks about active inference, where “the stimulus does not determine the response, the response determines the stimulus” and “We sample the world to ensure our predictions become a self-fulfilling prophecy.”. If this doesn’t make a lot of sense to you, you should read this tutorial, in order to recalibrate your ideas of how little sense things can make.

PCT talks about behavior being the control of perception. For example, suppose you are standing on the sidewalk, facing the road parallel to the sidewalk, watching a car zoom down that road. At first, the car is directly in front of you. As the car keeps zooming, you turn your head slightly right in order to keep your eyes on the car, then further to the right as the car gets even further away. Your actions are an attempt to “control perception”, ie keep your picture fixed at “there is a car right in the middle of my visual field”.

Or to give another example, when you’re driving down the highway, you want to maintain some distance between yourself and the car in front of you (the set point/reference interval, let’s say 50 feet). You don’t have objective yardstick-style access to this distance, but you have your perception of what it is. Whenever the distance becomes less than 50 feet, you slow down; whenever it becomes more than 50 feet, you speed up. So behavior (how hard you’re pressing the gas pedal) is an attempt to control perception (how far away from the other car you are).

FE/PC: The dark room problem

PCT: [isn’t confused enough to ever even have to think about this situation]

The “dark room problem” is a paradox on free energy/predictive coding formulations: if you’re trying to minimize surprise / maximize the accuracy of your predictions, why not just lie motionless in a dark room forever? After all, you’ll never notice anything surprising there, and as long as you predict “it will be dark and quiet”, your predictions will always come true. The main proposed solution is to claim you have some built-in predictions (of eg light, social interaction, activity levels), and the dark room will violate those.

PCT never runs into this situation. You have set points for things like social interaction, activity levels, food, sex, etc, that are greater than zero. In the process of pursuing them, you have to get out of bed and leave your room. There is no advantage to lying motionless in a dark room forever.

If the PCT formulation has all these advantages, how come everyone uses the FE/PC formulation instead?

I think this is because FE/PC grew out of an account of world-modeling: how do we interpret and cluster sensations? How do we form or discard beliefs about the world? How do we decide what to pay attention to? Here, words like “prediction”, “expectation”, and “surprise” make perfect sense. Once this whole paradigm and vocabulary was discovered, scientists realized that it also explained movement, motivation, and desire. They carried the same terminology and approach over to that field, even though now the vocabulary was actively misleading.

Powers was trying to explain movement, motivation, and desire, and came up with vocabulary that worked great for that. He does get into world-modeling, learning, and belief a little bit, but I was less able to understand what he was doing there, and so can’t confirm whether it’s the same as FE/PC or not. Whether or not he did it himself, it should be possible to construct a PCT look at world-modeling. But it would probably be as ugly and cumbersome as the FE/PC account of motivation.

I think the right move is probably to keep all the FE/PC terminology that we already have, but teach the PCT terminology along with it as a learning aid so people don’t get confused.