Click to Show Episode Transcript

Click above to close.

0:00:00 Sean Carroll: Hello, everyone, and welcome to the Mindscape Podcast. I’m your host, Sean Carroll. And in the course of doing many podcasts, interesting phenomena arise when I look at what the comments are and from various sources, whether it’s email or Twitter or comments on the webpage or on YouTube or Reddit. Different podcast episodes have different spirits in some sense. Sometimes we’re big picture, right? We’re sort of talking about various ideas from a very high-level view and it’s more inspirational than challenging, right? It’s like thinking about things rather than getting a lecture in them. Other times, we get a little deeper, we kind of get our hands dirty, we get into the weeds, we try to dig into some specific example of something. Either way, I will get complaints, that I know. And you know what? I love and cherish the complaints, because I want constructive feedback, I want to hear what people have to say. But, honestly, going forward realistically, it’s going to be a mix of both kinds.

0:01:00 SC: So today, we’re getting our hands dirty, we’re going to get into the weeds. Don’t be afraid, don’t think that, well, this is going to be a slog, or anything like that. This is one of the most fascinating episodes I think that I’ve done here on Mindscape, and perhaps the most useful, in a sense I will explain to you in just a second. Our guest is Karl Friston, who’s a neuroscientist at University College London, and Karl Friston is, by many measures, the most influential neuroscientist alive today, the most citations, the highest h-index, all these different quantitative measures of scientific success. He is a practicing psychiatrist and he’s very interested in schizophrenia and he serves patients, but he has also contributed to neuroscience more broadly, most obviously in developing techniques for imaging the brain, ideas like statistical parametric mapping, voxel-based morphometry, I really have no idea what these ideas are, sorry about that.

0:02:00 SC: What I’m interested in is where he’s moved more recently in his career into the theory of how the brain works, and he’s been developing an idea called the free energy principle as part of a bigger set of ideas called the Bayesian brain, the idea that what the brain is trying to do is to model the world around it, and therefore, develop a little picture of what’s going to happen next using Bayesian inference, something you all are experts on ’cause you’ve all read The Big Picture or somewhere else. Bayesian inference is getting data in and using that data to update your beliefs about the world. The free energy principle is Friston’s idea for how the brain effectively does that. It turns out that calculationally, updating your beliefs about the world can be very, very hard. The free energy principle is a way to sort of simply and quickly get to an effective view of the world.

0:02:51 SC: And the basic idea is that the brain is constantly trying to minimize surprise, it’s trying to develop a model for all the stimulus, all the sensory input that it’s going to get that is least likely to be surprised by something new happening. So that sounds simple, but when you get into it, when you look at the actual way it’s supposed to work, it actually turns out to be pretty darn complicated and intimidating. And famously, there’s a large number of people in a lot of different fields, not just neuroscience but deep learning, machine learning, biologists and physicists and a whole bunch of people who have trouble really figuring out what this is all about. So I really think that in this podcast we present quite an understandable picture of what it’s all about.

0:03:37 SC: There’s some jargon, but we explain what the jargon is. Of course, Karl understands what it’s all about, but I think that we’re able to give enough examples, talk a little bit about why the brain would work this way and what it’s supposed to be, why in particular he’s interested in it from the point of view of addressing schizophrenia and other problems. And to me, of course, I’m biased, I know a lot about free energy and entropy and measures of information theory, etcetera, but I think we did a good job of uncovering what’s going on here. There’s no equations in the podcast, but the ideas I think are out there, this is the kind of episode which it really repays listening closely to. I think you can learn a lot and learn about something that is really at the absolute cutting edge of modern neuroscience.

0:04:21 SC: I also want to mention a tiny little announcement, that I have a Patreon account that you can find on the webpage, preposterousuniverse.com/podcast, there’s a link to the Patreon. One of the benefits that Patreon users get is that they get a monthly Ask Me Anything episode. So Patreon users ask me questions, I try to answer as many of them as I can. And someone on Patreon suggested that even though it makes sense that Patreon supporters get the ability to ask questions, the ability to listen to the answers might be more widely shared. So I’m working out a way to do that. Every month, there’s like a two or three-hour episode that I put on Patreon, and right now it is only for Patreon listeners, but going forward, I’m going to try to figure out how to make those answers available to anyone. Right now the only way to do it is going to be to go to the Patreon page every month when they appear, but hopefully, I’ll figure out a way to put them into the regular podcast feed.

0:05:22 SC: The trick is that it costs me a lot of money to put two or three hours worth of gigabytes of data onto the podcast feed because my host is really expensive, it gets paid for by the ads, it’s not clear whether we can get ads to pay for the AMA or not. Let me know, especially on the comments in the blog post associated with this particular episode, whether or not this is a good idea at all, whether people would be interested, would it make sense just to include like one hour’s worth of answers, then you can go to the Patreon page to get the rest, etcetera, etcetera. But I think it’s a different kind of thing, it’s not going to replace regular Mindscape episodes, but it might be a different way to get some ideas out there and talk about them. And with that, let’s go.

[music]

0:06:23 SC: Karl Friston, welcome to the Mindscape Podcast.

0:06:24 Karl Friston: Thank you, glad to be here.

0:06:26 SC: I figured we would talk about this thing called the free energy principle, which you’ve been investigating and championing for a while now. But maybe just to get there, rather than just start by defining that, could you just explain what is the problem that we’re trying to solve? What is the question that we’re trying to answer by talking about free energy?

0:06:44 KF: From my perspective, it’s trying to find a first principle account of sentient behavior. And just very practically that’s relevant because of my background, which is as a psychiatrist. So, very simply, as Richard Feynman says, if you want to understand something, you’ve got to be able to build it. If you want to understand psychiatric patients, you have to, in some minimal way, be able to build or simulate sentient behavior that goes wrong. So, that’s basically how I got into it.

0:07:14 SC: So, you were actually a practitioner with patients, and the whole background?

0:07:18 KF: Oh, yes.

0:07:18 SC: Transitioned out of that.

0:07:19 KF: Yes, slowly but surely, yes. Transferred my angst from patients to students. But I did spend an early part of my life in a therapeutic community with 30 chronic schizophrenics…

0:07:33 SC: Wow.

0:07:33 KF: Which was an-eye opener for several years.

0:07:36 SC: [chuckle] I can imagine, yeah. Okay. So that… So, we want to understand how the mind works, how the brain works, in part to help fix it when it goes wrong.

0:07:45 KF: That’s the ultimate agenda, because I got seduced away from clinical psychiatry into systems neuroscience and brain imaging. And the question became slightly less focused and more how does the brain work. And that became relevant when you’re trying to characterize or analyze neuroimaging time series from things like functional magnetic resonance imaging or electroencephalography. To make sense of these data, you have to have some conceptual generative or forward model of what’s actually under the hood.

0:08:17 SC: Sure, otherwise it’s just a bunch of time series data, yeah, but… And neuroimaging is sort of where you made your money, as it were, right? That’s what you did for a while, right?

0:08:22 KF: That’s my day job… Yeah, absolutely.

0:08:26 SC: [chuckle] And thinking about the grand theory of the brain is what we’re doing now. So, good. That’s enough for me to dive into free energy, except that you had this lovely story that I’ve heard about woodlice, when you were young and seeing them scurry around, that does set the stage very nicely. If you could tell our listeners that story.

0:08:43 KF: Right. Well, that was my first, looking back, my first sort of scientific insight. It was a hot summer’s day, and I was… I must have been about between five or eight years of age, playing in the garden, and just became preoccupied by watching little woodlice scurrying around, noticing that they tended to avoid sunlight, that they ended up underneath bits of rock or wood in the shadows.

0:09:12 SC: In shady places, yeah.

0:09:14 KF: In shady places. And just looking at this, I thought, “That’s interesting, ’cause that looks like purposeful behavior. It looks as if they are purposefully in a goal-directed way avoiding the sunlight.” But then, there was another interpretation that came to mind, “Well, yes, but you would also see exactly the same phenomenology if they just moved more quickly when they were warmed up by the sun.” So there was a more deflationary account, I didn’t use these words at the age of five.

0:09:42 SC: [chuckle] At the age of five, yeah.

0:09:45 KF: The essence of the insight was, well, there’s a much simpler explanation of what’s going on here for this sort of very elemental form of self-organization ensemble dynamics. There’s a simple explanation, things just move fast when they’re hotter. And that notion, that sort of deflationary, simple, almost verging on a tautology explanation for self-organizing behavior kept re-presenting itself throughout my education. So, natural selection, I think, is a nice example of that. If it doesn’t work, you move in some phenotypic space, right through to physics at Cambridge, and sort of density dynamics and the Fokker-Planck and quantum physics. Again, if it’s not good, if it has a high potential, just get out of there.

0:10:34 SC: Yeah. Were you a physics undergraduate? Is that what you studied?

0:10:36 KF: Natural sciences, yeah, so half psychology and half physics and quantum physics.

0:10:41 SC: I had no idea, so audience, don’t, but… I don’t seek out people who are physics undergraduates, but I find them in all sorts of fields.

0:10:47 KF: You are attracted to them.

0:10:48 SC: Apparently, yes. [chuckle]

0:10:50 KF: But in a deflation way, just because you move away from people who are physicists.

0:10:54 SC: That’s right, exactly. But I love that kind of story, because it’s an example of what I talk about a lot in The Big Picture, this emergence of purportedly higher… Well, purportedly purposeful teleological, goal-directed behavior, out of things just obeying the laws of physics one way or the other, right?

0:11:12 KF: Yeah. Exactly.

0:11:14 SC: So, that’s the kind of thing that you saw in your undergraduate education, and even today, I presume, in studying the brain?

0:11:18 KF: Yes. Well, at its heart, that is the free energy principle that we’ve been promoting. It is very much the sort of deflationary, getting back to the first principles and then rebuilding up on that and seeing what would this kind of behavior look like in a sufficiently itinerant context. How would it… Would it be for purpose to explain the behavior of you and me? Or, starting at a slightly simpler level, would it be fit for purpose in explaining a thermostat, or a virus, or something, an ensemble of bacteria, or the like. So, it’s a question of how far can you get from first principles as a principled account of sentient behavior. So, we get back to the human brain. And, of course, that is, in terms of writing down the dynamics, the mechanics, and specifically in mathematical terms, it becomes necessary, if you want to write down formal models of brain imaging times series. So, there’s an interesting dialogue between the rather self-indulgent theoretical neurobiology side of it, what I I often refer to as the work you do at the weekend, when the pressures are off, and the day job, which is analyzing brain data.

0:12:13 SC: Right.

0:12:13 KF: Both sides inherit from each other in a very interesting way, that you’re constantly building models of how the brain works, and then testing those models in relation to the empirical data you get from brain imaging, which forces you, puts pressure on you to actually sort of think what’s the simplest sort of dynamical functional computational architecture that could possibly explain these data when I look at this sentient creature, then usually at a normal subject, human subject when exposed to these experimental manipulations. And yet on the other side, the very algorithms of schemes or data analytic approaches that you apply to make the inference about is this the right model of the brain or is that the right model the brain themselves now become… Inherit from the theoretical work, because if you can solve how a brain works…

[laughter]

0:13:47 SC: Absolutely.

0:13:47 KF: That’s the best sort of data analysis machine you can possibly have. And then that helps with analyzing the actual data.

0:13:53 SC: Has it affected what data you collect?

0:13:55 KF: Yes. Yes. I mean, that’s true in both senses, it’s true in terms of me as a sentient creature in the sense that…

0:14:02 SC: You’re an example.

0:14:03 KF: Yep, I’m looking around and I’m literally collecting visual data as I [0:14:08] ____ around the room, interrogating your face, trying to anticipate whose turn it is to talk, whether you’ve understood me. So I’m selectively sampling the right kind of data to resolve uncertainty about those hypotheses that are relevant to my behavior at the moment. And in exactly the same way as a neuroimaging scientist, I designed those experiments to solicit the right kind of data that resolve my uncertainty about my hypothesis about the functional integration of the hippocampus with the prefrontal cortex, same stuff, exactly the same principles.

0:14:42 SC: Good. Except now, I’m self-conscious that everything that I do you’re analyzing. Well, I mean, of course you’re going to be looking in my face in a different way than you’re looking at the desk in front of you which you see every day, right, the surprise, all the new information is much larger for something like that, so okay, good. What is free energy when you think about it? ‘Cause I’m a physicist and I have a definition and I think it’s a little bit different than yours, but there’s a mathematical relationship. So I want to clear up for everyone in the audience what we mean when we use these words.

0:15:08 KF: Right, so when I talk or when people… I don’t say I, it’s a little bit self-centered, when people in things like machine learning talk about free energy, they mean variational free energy. So technically, if you were talking to somebody from machine learning, we’re talking about an evidence low… Upper bound, they switched the sign in machine learning, so often called an evidence lower bound, acronym ELBO, which confused me for an enormous amount of time. I literally thought it was part of your body.

[laughter]

0:15:37 SC: That’s asking for trouble, really.

0:15:41 KF: So it’s a statistical information, theoretic concept, which licenses your previous discussion about hypotheses and inferring this and rich information. So it scores basically a bound on technically the self information or the LOB probability of some data given a model of how those data were generated. So from a pure…

0:16:11 SC: So sorry, that’s the crucial thing, there’s a model and there’s data.

0:16:14 KF: Yeah.

0:16:14 SC: And we would like them to match.

0:16:15 KF: Yes, absolutely. So well, let’s just rehearse that, ’cause that’s absolutely fundamental. Everything that we talk about either in terms of sort of sentient behavior and the free energy principle as applied to sentient artifacts or in terms of actually analyzing data rests upon this notion of a generative model. Generating what? Generating data, generating sensations, generating any observables, and an even more simple expression of that is you have to have some way of articulating a mechanics of causes and consequences. So the generative model causes or is a description of how causes generate consequences, and in this instance the consequences are sensory observations or data observables, measurables.

0:17:12 KF: The causes are the sort of the latent variables, features, structures, whatever you want to call them that are responsible for generating those things. So central to the free energy as a scalar functional is a generative model, so it’s only defined in relation to a generative model. That tells you immediately the variational free energy is a function of a function of a function and the function that it is a function of is a probability distribution or a belief.

0:17:44 SC: So this is your brain giving probabilities to the various things it might experience out there in the world and the free energy is a way of measuring… All to say relating those, that prediction for what you see to what you actually see and then sort of what measure is it? What does it characterize really?

0:18:04 KF: Well, exactly as you defined it, it’s a measure of the surprise that you would have, and I’m using surprise in a sort of yeah, well, actually it’s in a folklore sense, a folk psychology sense that literally the surprise of all the self information.

0:18:24 SC: There is a technical version of it, which is in this case pretty close to the folk version.

0:18:27 KF: Yes, indeed, absolutely. So it’s the surprise that you would associate with a bunch of data given a belief or a model about how those data were generated. So technically it’s also called the marginal likelihood or the logarithm, the negative logarithm of the marginal likelihood, and that marginal likelihood is also called model evidence. And all this rhetoric becomes important because there’s… You can gracefully move from one interpretational stance to another one, without making any mathematical moves whatsoever.

0:19:04 SC: Right, the math carries over.

0:19:06 KF: It’s just exactly the same, but depending upon how you grew up or how you appreciate these quantities or what the rhetoric that you would interpret them, you get a very different look and feel to the fundamental behavior of systems that look as if they are minimizing their variational free energy. So if this surprise is interpreted from the point of view of a statistician, so the brain as a statistical organ, a little scientist inside your head, then, if it is in the game of minimizing its variational free energy, its evidence bound, that makes it appear as if it is maximizing model evidence.

0:19:54 KF: What does that mean? Well, it will look as if it’s gathering information in the service of seeking evidence for its own existence, and this translates nicely into a philosophical concept, self-evidencing. So another way of looking at this purely mathematical sort of behavior is in terms of self-evidencing, so you have lovely little phrases like people going around gathering evidence for their own existence, which literally is mathematically a truism.

0:20:26 SC: Yeah.

0:20:26 KF: So that’s only one, or you can look at it the other way round. We’re trying to minimize surprise. So what would that look like if you were in, you’ve been taught to think about things in terms of information theory and uncertainty? Well, mathematically expected surprise, expected self information is entropy, entropy is one way of describing uncertainty. So what does that mean? It means that you look as if this creature or this artifact or this system is gathering information in the service of resolving uncertainty. Uncertainty in relation to its model of how we think the world works and in particular how it is situated within that world and sampling from that world. So that brings us back to your phrase earlier on about looking around, not looking at my desk because there’s no rich information there, but looking…

0:21:16 SC: I wouldn’t be surprised. Hopefully.

0:21:19 KF: Just to reassure you, I haven’t got my glasses on so I can’t see anything.

0:21:24 SC: It doesn’t look very surprising to me.

0:21:27 KF: So what is another way in information theory of articulating the notion of rich information, it’s just minimizing uncertainty, it’s maximizing the relative entropy as scored by technically the KL Divergence that is part of an expected free energy. But put more simply, it just means that minimizing free energy or making moves that will minimize free energy in the future simply means maximizing information gain or optimizing some divergence measure, literally making your mind up in the sense of moving a belief, moving a probability distribution from one prior belief to a posterior belief, and the more information that you have at hand, you will assimilate and the KL Divergence will in this instance be greater and that’s a very important part of the… When you actually write down the imperatives for action that becomes a very, very important part.

0:22:35 SC: And this kind of makes intuitive sense, right, that our brain would like to carry around with it a model of the world such that it typically looks around and says, yes, that’s more or less what I would have expected.

0:22:45 KF: Absolutely. Yeah.

0:22:45 SC: And so the free energy is the difference between what it’s sort of expecting and what it’s seeing, so it wants to minimize that.

0:22:52 KF: Absolutely.

0:22:52 SC: And it’s completely information theoretic, which I need to say out loud because of course the word energy appears in it and as a physicist, we have a notion of free energy that it’s a kind of energy you can use to do useful work. And in fact, in the thermodynamic system, when you maximize entropy you’re minimizing free energy and vice versa. Like if you have a box of gas that is in equilibrium it has no free energy, lots of entropy; if it’s all bundled up on one side, it’s the opposite. But you’re looking at a context in which the brain is both minimizing a kind of entropy and minimizing a kind of free energy.

0:23:28 KF: That’s a great paradox.

[laughter]

0:23:30 KF: Now I shall try to unpack it.

0:23:33 SC: Yes.

0:23:33 KF: But we are right in the depths of the…

0:23:37 SC: It’s the weeds. Yeah. That’s okay.

0:23:39 KF: So I apologize in advance, so let’s just back up and…

0:23:42 SC: It’ll pay off later, we’ll bring it down to earth, don’t worry.

0:23:45 KF: Okay then. So let’s just start from, yeah, you’ve gone to university, you’ve learnt that free energy is that amount of energy that is available to do work that’s not locked into the entropy, so the total entropy is the expected energy minus the entropy. So what that would suggest in terms of… So the first thing to acknowledge is that the form of the free… The variational free energy is formally identical to a sort of Gibbs or thermodynamic equation.

0:24:16 SC: It’s the same equation.

0:24:17 KF: It’s the same equation. The only move you make is you drop Boltzmann’s constant from the Shannon entropy, that’s all you do.

0:24:24 SC: We set it equal to 1 anyway, so it doesn’t make any difference in there.

0:24:27 KF: There’s no difference between us. We are speaking the same language. And on that view, what does that mean? Well, that means if you now write down the energy as a potential, a potential energy, and I’m thinking now from the point of view of a statistician, for example, what’s the potential energy that gets into the variational free energy? Well, it’s effectively the accuracy. So it’s the negative of probability of getting these data given the parameters of my generative model, given all the variables, the quantities, the structure of my model of what could have caused those data, my hypothesis, if you like, that is parameterized and that’s quite important. So the accuracy is this basically, the surprise that we were talking about before.

0:25:19 KF: I should say the way that you described it makes sense having a model of the world we can make predictions and then we can test those predictions against sensory impressions, absolutely spot-on. And indeed, there’s a whole industry both in terms of data compression and engineering, but also more recently in neuroscience and particularly cognitive neuroscience, that is predicated on predictive coding, which is exactly that. So prediction errors are just the mismatch between what your generative model predicted and what you actually sample, you take the sum of squared prediction errors, you weight them by some precision or inverse variance. That is basically free energy. Under some simple assumptions about the generative model and the nature of random fluctuation. So predictive coding is one instance of the more general notion that we’re in the game of minimizing our free energy.

0:26:12 KF: So coming back to this, the physicists’ conception of free energy in systems that have attained equilibrium, what does it then mean to minimize the free energy? Well, you’re trying to minimize your energy, which is maximizing your accuracy but minimizing the surprise, averaging out your ignorance about the parameters and then you’re trying to maximize the entropy. Now, that may seem paradoxical, which is why it’s good that you or I apologize because we’re going to have to resolve… So I’ve just said, if you remember, minimizing uncertainty by choosing the right moves that will get the most information, resolve the greatest uncertainty looks as if it’s trying to maximize the information again on the relative entropy. But now I’m saying, well, minimizing free energy will require a maximization of entropy, from the physicist’s point of view.

0:27:15 SC: Yeah.

0:27:15 KF: And that’s absolutely right. So the key distinction is basically what I do at this moment in time will always be to maximize my accuracy or my energy in terms of minimizing this sort of prediction error, whilst at the same time maximizing my entropy, keeping my options open, because the entropy that we’re talking about is an attribute of a belief about the causes of my data. So this is not an entropy measure of the brain biophysically.

0:27:50 SC: Right. It’s not the molecules in your brain that we’re treating as a thermodynamic system, it’s a set of beliefs.

0:27:56 KF: Absolutely. So these beliefs are then driven to be as broad as possible, entirely consistent with the second law of thermodynamics. You know, there’s an imperative, there’s a drive for disorder, the entropy will increase, but it’s the entropy of our beliefs. Now, what does that mean? Well, it basically means I’m trying to find a low energy explanation for my data, whilst at the same time keeping my options open. So this is essentially Occam’s Razor. It’s this basically not committing to a very precise posterior belief. If I’ve seen these data, then I believe this caused it. So you don’t want to commit to a very precise one, so you’ve got to find the simplest, the most accommodating explanation for your data. So that’s the paradox, if you like, in terms of whether you’re trying to maximize or minimize the entropy term comes in, but I think more fundamentally makes the point that the entropy we’re talking about at the moment is a functional or a scalar functional of a belief about something, it is not the thing that is encoding those beliefs, it’s not the neuronal firing or the molecules or the atoms.

0:29:17 KF: The twist, the slightly paradoxical aspect of this is when you move into the future and when you have the beliefs about the consequences of an action, say looking over there or googling a certain entry or going to Wikipedia, before you make that action, you have beliefs about how your free energy is going to change. And at that point, your entropy or your relative entropies effectively switch around, because the outcomes now become random variables and you have to take an expectation… This is a bit technical, but it’s a beautiful little bit of techie stuff which basically flips this imperative to minimize these entropy and relative entropies when you’re applying it to the system as it is now and as it is behaving. When the system looks at itself, saying, well, how would I have to now act in order to minimize my free energy in the future, I now include basically outcomes as random variables and they get into the expectation operators and suddenly you’re then in the service of minimizing…

0:30:24 SC: Minimizing an entropy, yeah.

0:30:25 KF: So there’s this sort of yin yang, which means that as I’m currently processing my data, I’m striving to find the explanations that maximize my uncertainty, because I don’t want to commit to a particular belief. Yet at the same time I’m going in the exactly opposite direction. I’m trying to sample those data that will shrink my uncertainty. And then when I find that balance, then we have this sort of active self-evidencing.

0:30:50 SC: Let me try to put it in my words, and you can tell me if I’m coming close here. I mean, we want to minimize the times that we’re surprised. You might think, well, just predict the thing you think is most likely to be true all the time, but if you put 100% probability on that, then any time you’re not exactly right, you’re hugely surprised and that’s bad.

0:31:11 KF: Absolutely.

0:31:11 SC: So you might say, well, let’s do the other thing, let’s have no beliefs about anything. Let’s say anything could happen. But then, in some cases, especially when you go through the math, you’re always surprised, everything that happens is a little bit unlikely, ’cause it could have been anything else. And so there’s this compromise where you sort of try to home in on something you think is most likely, but you do give allowance for the deviation.

0:31:30 KF: Absolutely, that’s beautifully put. As a physicist, what you could understand what you’ve just said as is basically how do we describe systems that have some itinerancy, but they still restrict themselves to a limited part of their phased space or their state space. So it’s not one point.

0:31:52 SC: Exactly.

0:31:53 KF: We’re not little marbles or moon rocks. Nor are we gases. But we are, we certainly have boundaries and we have a shape and a form, in the sense of those parts of the state space we could occupy or where the woodlice are running around, there’s a structure there. So that structure if it is the case that these sorts of systems exist, things like you and me, or anything actually exists in a nontrivial way, by nontrivial I mean basically having an attracting set that is just one, being in one state.

0:32:17 SC: Deterministic…

0:32:17 KF: Absolutely, yeah, then there has to be this balance between this itinerancy, this sort of… The expiration of a phased space in a structured way where many regions are visited, but some regions are visited more often than other regions. And when you start to think about, well, how would you articulate that mathematically, you start to get random attractors that inherit from random dynamical systems, so what you’re saying is there’s an attracting set out there, it is not… It’s probability distribution. If I measure the probability of finding me at any point in my state space over a very long period of time, it’s certainly not uniform.

0:33:16 SC: No.

0:33:17 KF: I have a shape, I have a temperature…

0:33:18 SC: Some things are more likely than others.

0:33:19 KF: I have a personality. Absolutely. And yet I am not a fixed point, I am not just in one state. So what you are talking about effectively is a non-equilibrium steady state. So it is not the kind of equilibrium steady states that physicists knew and love and know and love and were taught in the 20th century. This is the new challenge of understanding non-equilibrium steady state in open systems, that exactly have this delicate balance between being a point attractor and being completely diffused at the end of the universe, if that’s where it’s going.

0:33:56 SC: Just parenthetically, I did have Antonio Damasio on the podcast a little while and his favorite word in the universe is homeostasis, right, the idea of keeping within this tiny little range as much as we can, but some flexibility there. And you want to, correct me if I’m wrong again, but you really want to say that this minimization of free energy and surprisal is sort of the key to unlocking what the brain does, right, it’s the underlying thing for most everything.

0:34:24 KF: Yeah, it is. Absolutely. There’s a joke in my group meetings that the answer to any question is model evidence. Usually, that’s…

0:34:41 SC: You could do worse.

0:34:43 KF: And certainly when you’re talking about colleagues and research fellows and whenever they ask a question, “Well, what happens if this is like that?” Well, get the evidence for that hypothesis, and then you can quantify your belief about whether it was like that or whether it broke like that, or whether that difference is in play. So can I come back with another parenthesis? Because I think it nicely follows on from Antonio Damasio’s focus on homeostasis. Of course, the roots are much of self-organization to non-equilibrium steady state inherit from the work of people like Ross Ashby who made apparent his ideas through the homeostat. So, it’s exactly the same. Or the good regulator theorem.

0:35:21 SC: I think that… This is just my personal opinion. I think that physics has a lot of work to do, and there’s a lot of discoveries to be made along exactly these lines. Where non-equilibrium statistical mechanics is a booming growth field, I try to get my colleagues excited about it, but they’re used to doing their things. It’s a tricky kind of pre-paradigmatic area where we don’t exactly know what the rules are.

0:35:49 KF: That’s a nice way to phrase it. I don’t know, because I became a psychiatrist and left the physics behind. But certainly surfing the web, that pre-paradigmatic thing, that’s very exciting, though, isn’t it?

0:36:02 SC: Oh, of course.

0:36:02 KF: Because that’s the next thing.

0:36:03 SC: Yes, exactly.

0:36:03 KF: That’s 21st century physics.

0:36:04 SC: It’s much more comfortable when you know like in particle physics or cosmology, where I was raised, you know what the questions are, you know what would qualify as an insight. Whereas in complex systems, dissipative systems… Ilya Prigogine was a early pioneer here also, right? Self-organization, and we just don’t know what words to use, we don’t know what equations to use, but I think like you, I’m a fan of these statistical mechanical lenses of which to view these things.

0:36:32 KF: Yes. I’m sure that’s the only way really to… For me, it is the only way to write these things down, because at the end of the day, you actually have to have a model in code that generates predictions to analyze the brain data. So there is no other… You don’t really have the… Unless you’re a philosopher or you write books, there is no…

0:36:52 SC: You need to be able to give it to a computer and ask it how well you’re doing. Let’s just do sort of the reality check for this perspective here. I certainly get that if I’m driving down the street I would like to be surprised as little as possible, but isn’t it true that also informally, I do sometimes seek out new experiences, right? How does that fit in?

0:37:12 KF: Well, it’s exactly that sort of paradox, the way we’re addressing it technically in terms… The difference between, me at this point in time, this moment, minimizing my free energy via a maximization of the entropy uncertainty of my explanations or beliefs about what’s going on now, and choosing those actions that will in the future minimize my expected free energy. That basically means that when you talk about the expected free-energy conditioned upon a particular action move, you know, looking over there when driving the car, you have this opposite imperative. So now you become information-seeking, now you become a curious creature, now you become sensation-seeking.

0:38:00 KF: But it’s a particular kind of sensation-seeking. It’s those sensations that would resolve uncertainty about what would happen if I did that. Of course, behind a lot of the ways I’ve just described that is a sense of me as an agent. So once you generalize the non-equilibrium physics of sentient systems to write down what might be the imperative for the way that they act upon the world, that they evidence their agency, and you make the assumption that, or you try to prove it can be no other way, that they will act in the service of minimizing the long-term average of free energy in the future, then you get this curiosity. Written into the information theory. To resolve uncertainty means you’re going to be sensation-seeking. Now, whether that’s a sensation-seeking of a banal sort, that you’re looking at the traffic lights or the street lamps to see whether it’s “go” or “stop” or whether it’s going to a disco and doing bungee jumping… It’s the same imperative underneath.

0:39:03 SC: But this is the response to the wisecrack that if we just want to minimize surprise, we would sit in a dark room and not do anything, right? But how innocent is that move to go from minimizing surprise to minimizing the expectation value of all my future surprises? That seems like a little bit of a different minimization. Is that fair? Which one is it that we’re doing?

0:39:25 KF: Well, in a minimal sense you’re doing both. You’re sort of touching on the issues that normally come to the end of the conversation…

[laughter]

0:39:38 KF: What is the difference between you and a virus? So, to cut to that distinction before revisiting the fundaments of minimizing expected free energy in the future, it may be the case that certain systems, certain… Yeah, biotic systems, creatures basically, have acquired the capacity to install in their generative models the prior belief that they are free energy minimizing creatures. And if they can do that, then they will have the prior belief that, the way that I will act will be to minimize my free energy. That was how you might write it down as a physicist.

0:40:29 SC: I see where this is going. Yes, good.

0:40:32 KF: If you were a psychologist, all you’d say is, “I have the capacity to plan.” So that’s all you’re saying, really.

0:40:37 SC: Or to imagine.

0:40:38 KF: Exactly, yeah. So you have sort of… Yeah, perfect. So your generative model is now equipped with the capacity to imagine a counterfactual fictive future, to imagine the future, to roll out possible consequences of actions. And of course the consequences of those actions in terms of the observations now are random variables because they haven’t happened yet. And that’s why you get this reversal. Suddenly you become a creature that seeks out sensations, literally sensation-seeking, that resolve uncertainty, you become curious and you go to your discos and you do your bungee jumping, at a certain age. So that’s, I think, an important distinction between very simple attracting sets.

0:41:26 KF: Let’s go right back to the moon rock, okay? So an appropriate description of that non-equilibrium steady state is in fact an equilibrium steady state and that’s the case if it’s got a pointed trajectory, it could have a quasi periodic attractor, say the orbiting of the planets. But these are very simple non-itinerant attracting sets that would approach an equilibrium steady state. Then we move up to systems right from rocks through to… You may even go to as far as some, not insects but certainly say viruses. So things that don’t plan. But they live. They’re very effective at occupying our universe and so on.

0:42:08 SC: But they live in the moment, in some sense.

0:42:10 KF: Absolutely, yeah. In Joe Haldeman’s words, there is no remembered present, there is no imagination, there’s no planning. They have all the mechanical finesse of a thermostat. But they’re very good at what they do, so you know.

0:42:25 SC: Homeostasis, yeah. They achieve it.

0:42:27 KF: Well. That’s a nice beautiful example. So even in our own bodies there are… Possibly 99% of all that actually goes on in terms of physiology and, you know, which is homeostasis, is just this reflexive in the moment keeping yourself in that attracting set.

0:42:42 SC: Right. Regulating yourselves, right.

0:42:43 KF: Regulating yourself. So those kinds of systems are distinct, I think, from systems like you and me, that start to plan and have the ability to… Their generative models do actually span the future and by implication the past. They implicitly have a dynamics where the trajectories actually go quite a long way… Or in their generative model, go quite a long way into the future. So that would be a really interesting way to take forward the argument or a response to your question, but your question is slightly… Just to return to say, is it minimizing free energy, or is it minimizing the free energy conditioned upon a particular action?

0:43:31 SC: Expected, yeah.

0:43:31 KF: Expected action. It is both. The degree to which you actively minimize your uncertainty depends really upon the shape of the attractor that we’re talking about, the attracting set that we’re talking about. So, with itinerant systems it is possible to sort of write down the density dynamics and work out the probability distribution of trajectories of action into the future. So there’s just… Action is just another state. And you can apply fluctuation theorems or pathetical formalisms, to work out a distribution over trajectories of action into the state… Into the future, my apologies. Which means that technically what you can do now is characterize a given system in terms of probability distributions over courses of action, policies into the future, trajectories or paths of action, under a model of what would the implications of that particular action have of all my sensory input, for example.

0:44:43 KF: And when you write that down, there is a way of showing that that is essentially a description of systems that do minimize their expected free energy in the sense of just minimizing the distance, in fact technically the KL divergence, between where they think they are probabilistically and their attracting set.

0:45:16 SC: Okay.

0:45:17 KF: So that’s one part of expected free energy. There is another part, which is all about ambiguity reducing. Which kicks in when there are particular constraints on the shape of this probability distribution, that you and I evince just by existing. That depends, really, upon the itinerancy, which you can measure in terms of mutual informations or relative entropies between different partitions of the states. I was gonna say…

0:45:48 SC: [laughter] There’s probably some details here.

0:45:54 KF: Here comes one little detail. All of this rests upon a Markovian partition into a Markov blanket. It all rests upon carving the universe into internal states that are inside you, that constitute your internal states, the rest of the universe, and then crucially blanket states that separate specifically you from the rest of the universe, that enable you to be identified. And a further by-partition on those blanket states into active and sensory states. So once you’ve…

0:46:25 SC: Compartmentalizing…

0:46:26 KF: Compartmentalizing the different kinds of states that would be necessary to describe a universe in which something exists, I.e., you, in a way that is separable from not you. Then you can start to write down… You can go beyond just the 20th-century physics in terms of entropies of distributions, of say an idealized caste or some sort of closed system. Now you have to deal with the entropies of a partition, and now furthermore you’ve got the relative entropies now. So suddenly you’re in a game of… Which is pure physics. It’s just now you’ve gotta think about the relative entropies. You can’t just talk about the entropy of this ensemble, or the entropy of this wave function or this solution. You now have to carve it up and talk about the relative entropies, which is where the information, theoretic and all the information richness and all the uncertainty or that’s where it all starts to kick in. And of course, in so doing, you’ve actually now committed yourself to a mechanics of open system, because the whole point of having the Markov blanket is that it enables a two-way traffic between the inside and the outside. So now, by definition, you’re in the game of writing down these statistical mechanics of open systems that have some non-equilibrium steady state in virtue of having an attracting set.

0:48:00 KF: So now the game becomes, what different kinds of attracting set could be? And if they are, what will it look as if they are doing in terms of these mutual informations or relative entropies or incertainty resolving pressures are one way of describing the very existence of this attracting set.

0:48:19 SC: I do wanna get more into the Markov blankets, but I don’t wanna quite let go of this transition from living in the moment to living in the expected future, I guess. This seems to be, without me planning something that appears on the podcast over and over again, recently in a conversation with the philosopher, Jenann Ismael, and when we were talking about free will and what that means, that came up. And earlier with Malcom MacIver, who is a mechanical engineer and neuroscientist, who has a theory that one of the steps on the road to consciousness was when we… Fish climbed up onto land and could begin planning in the future, ’cause the timescales are much slower on lands, you have the ability to plan. So, I wonder is it possible in this framework to pinpoint a place in the evolutionary scheme where we flip over from living in the moment to being more planning animals? My personal theory is that it’s with cats. [chuckle]

0:49:15 SC: Because I have two cats. My listeners are well aware, Ariel and Caliban, and I swear that one of them, Caliban, just lives in the moment. Like, his needs are being met or they’re not and those are the only two states he has. Whereas Ariel, you could see that she’s trying to figure something out about what would happen subjunctively if she did something, and her little kitty brain is trying its best. So I’m sure that cats is an exaggeration, but is it as late as mammals where this becomes important, or do you wanna attribute it much earlier than that?

0:49:48 KF: I don’t know, but I’m compelled by your [0:49:53] ____ of cats. [chuckle]

0:49:53 SC: It seems very clear.

0:50:00 KF: Everything you’ve said in terms of these other perspectives, particularly from philosophy, makes entire sense to me. The ability to plan suddenly means you have now a space of trajectories, courses of actions in the future, and it means that you have… Because you can only realize one deterministic action, because action is actually a physical state of the universe, it’s not the… Action in and of itself is not a belief. We have beliefs about action but the action is realized. So that realization means that you have to commit to one of a multitude. So there’s a selection process in play, which must in some sense speak to the freewill. Or at least if it doesn’t, depending upon your attitudes to freewill, what it does say is if there is a selection process in play in terms of selecting an action from some probability distributional beliefs about the way that I am currently acting, then it must be the case that that only applies to systems that actually have posterior beliefs about the future. It cannot apply… Yeah.

0:51:17 KF: So a thermostat could not, I think, be confused with something that might express freewill. Whereas, your question now is, At what point do we have biological thermostats that have this beautiful homeostasis that become equipped with the capacity to imagine, to plan, to think, and as you intimated possibly even have some minimal form of consciousness or even self, perhaps selfhood before consciousness? So I normally recourse to the philosophical notion of a vague concept here. I only recently learned about this. This is why I like talking about it.

[laughter]

0:51:54 SC: There is a whole philosophy of vagueness, it’s true. We haven’t talked about that on the podcast, but that’s an interesting topic, yeah.

0:52:01 KF: So for those people who don’t… Like me, who don’t know, like I was a few months ago… So at what point is a pile of sand a pile? Is it one, two, three, four, five, grains? So, I quite like that as a way of… Without moving, getting out of the question at what point would you put your cap threshold? And I think it’s warranted or licensed mathematically, because even things like thermostats and viruses, they… Say predictive coding. So predictive coding does not have this planning, it doesn’t have this sense of…

0:52:40 SC: Maybe define for the audience, predictive coding.

0:52:42 KF: Well, predictive coding is just… Well, originally devised in the 1950s as a way of compressing sound files. It’s a very efficient way of complying with Occam’s principle, by retaining the most information but in the simplest coding that you can, which is another way of…

0:53:11 SC: Sort of algorithmic compressibility.

0:53:13 KF: Yeah, that is, in fact, the free energy principle as well, but just written in terms of algorithmic complexity and minimum measurements, it’s exactly the same maths but difference of events, spaces. So, predictive coding as currently applied to things like the brain is just the notion that we’re minimizing our prediction error.

0:53:31 SC: Yeah.

0:53:33 KF: So, it doesn’t talk about action. What it does talk about is how our brains might respond. How our decoders might respond to some new data, and they do it by reorganizing, belief updating, or state estimation, in a way that minimizes a prediction error. So if you can predict what is currently being presented, currently, exactly on the basis of what you have previously seen, then you must have a perfect model of what is generating that signal or that soundtrack or the auditory stream. And therefore you have minimized your free energy or variational free energy, or you’ve maximized your evidence, evidence lower bound, in terms of the pure sensory, the sentient aspects of it. Notice that we haven’t… Which is the important thing… We haven’t talked about what we’re gonna do or how we get there.

0:54:30 SC: We haven’t talked about that, yes, right.

0:54:31 KF: So this predictive coding doesn’t address active inference, or active learning, it just talks about how to make sense of data. So, that would be a nice example of the sensory part of a virus or a thermostat. It can be completely described as just minimizing the discrepancy. Take a thermostat, for example, and let’s now put action back into the mix. So, you can describe a thermostat, as just minimizing it’s prediction error. Prediction error between what? Well, between the temperature it’s sensing and its attracting fixed point.

0:55:10 SC: Mm-hmm.

0:55:11 KF: So it’s one of these fixed point creatures. And it’s got its attracting setting, it has its prior belief that the temperature should be like this, and all I need to do is to minimize my prediction error, minimize my free energy. And… I don’t know how I’m doing it, but I do seem to…

0:55:26 SC: By turning on the heater or the air conditioning, I guess.

[laughter]

0:55:29 KF: In fact, it doesn’t know what temperature is about, but it is equipped with action that will enable it to get to it’s fixed points. So, in what sense is that planning? If you remember, I’m trying to argue for a vagueness so I don’t have to answer your question. Well, when you formulate that kind of predictive coding in terms of Kalman filtering or Bayesian filtering, which would be the technical or the statistician’s way of describing a predictive coding scheme, you’re always working with derivatives. So you’re always working in a dynamical setting with not just the prediction errors but the rate of change of prediction errors with time. So in a minimal sense, you’ve got a notion of the future through a linear first-order approximation.

0:56:15 SC: Next instant anyway.

0:56:16 KF: Absolutely, yeah, yeah.

0:56:17 SC: Okay.

0:56:17 KF: So that’s what I meant in a minimal sense, that everything has, every generative model, has a notion of the future, just in virtue of having a notion of trajectories or dynamics.

0:56:31 SC: Yeah, Okay.

0:56:33 KF: But it’s not quite the same as your second cat. Yeah, it’s not thinking, Well, if I sit here I look like the… Yeah, the…

0:56:39 SC: And you can see it’s just at the level of her capacities. But you mentioned action and I think this is a natural place to go there, because you also want to say that free energy helps us understand how we behave. Is that safe to say, in fact? So let me sort of repeat the thing that I read, and then again, you’ll fix it. One way of thinking about what happens when I move my hand is that my brain sort of intentionally gets it wrong about where my hand is, and then there seems to be a mismatch between where my hand actually is and where my brain thinks it is, and rather than fix my brain, I move my hand to bring it to where it is. Is that…

0:57:16 KF: Yeah, that’s beautiful.

0:57:17 SC: Okay. [laughter]

0:57:17 KF: Yeah, I don’t need to fix anything there. In fact, we should celebrate that, because what you’ve just described is a modern-day recount of Ideomotor theory which was prevalent in the 19th century. Which…

0:57:34 SC: Helmholtz?

0:57:36 KF: Yes. Well, yeah, he did everything, so I could say yes.

0:57:39 SC: He was a high entropy thinker, yeah.

[laughter]

0:57:43 KF: There were other German neurologists and natural scientists who focused specifically on there, and then that was picked up by William James on his European tours. But the… It’s exactly what you just said. To move I have to, in my mind, imagine the outcome of that movement, and then just let my reflexes realize that imagined outcome. Which basically was in the Victorian era, posited as explanation for stage hypnotism, that, “You’re arm is getting lighter and lighter and lighter and lighter.”

0:57:52 SC: I see.

0:57:52 KF: And, of course, if you believe your arm is getting lighter and lighter and lighter and lighter and is floating, then your predictions about the proprioceptive input that you would get if your hand was in fact floating, can now be fulfilled at a sort of pre-awareness level simply by reflexes, and your hand will indeed just rise. So, as much as nearly all the free energy principle actually, particularly at sort of incarnations when applied to things like active influence, these are very old ideas. And you can actually probably trace them back to students of Plato, but they come through Kant and Helmholtz.

0:58:58 SC: Okay.

0:58:58 KF: Very much so.

0:59:00 SC: Yeah.

0:59:00 KF: And alongside that sort of perception as inference… Was the sort of action as inference, basically. Actions, beliefs about the way I should be, “Oh, yes, now I am like that.” And if, of course, you actually attend to the evidence that in fact your hand is not floating, then of course it won’t move, and which sort of takes us into the interesting scenario that you must in some sense attenuate the evidence…

0:59:29 SC: Prevent yourself from getting that.

0:59:30 KF: Exactly.

0:59:31 SC: Yeah, okay. Is this, is this… I guess, maybe what we have that Plato or Helmholtz or William James did not have is the ability to poke inside a brain and see what’s going on there. Is this idea that action is driven by a mismatch between model and sensory input verified, testable in the brain?

0:59:49 KF: Yeah, yeah, yeah, absolutely, at a number of levels. But again, just coming back to this sort of notion that most of what emerges from this, from a mechanical treatment of the free energy principle, was well-known, you know, a century ago. So, what we are describing are classical reflexes. So, there’s this… You know, the brain sends down messages to alpha motor neurons and spinal cord that effectively are a mismatch between the intended signals, sensory signals from the muscles, the motor plant, and what it’s actually receiving. And then those cells elaborate signals to the muscles to cause them to contract until the signals match. So, the way that we move, and in fact the way that we secrete, our autonomic function works, whether our heart works, in fact all of our homeostasis works, generalized, homeostasis works, is by supplying the right setpoints to servos or homeostatic or reflex mechanisms in the periphery. What are those set points? They’re just predictions of the way I want to be.

1:00:51 SC: And is the role of… Does free energy have a role here in an optimization problem, or an efficiency maybe mechanism? Like, there’s different ways you could imagine the brain or the nervous system bringing this match between expectation and reality, but is there just a sort of… There are easier ways to calculate how to do that? Using free energy, using…

1:01:16 KF: Well, the free energy formalism is, if you like… Grandfathers all of these particular manifestations.

1:01:27 SC: Yeah, okay.

1:01:29 KF: I guess what you’re asking is, if I wanted to now describe the biology or the wiring or the time constants. What you’d have to do is to write down the generative model. If you remember, the free energy is a functional of a belief, and the belief is defined in relation to a generative model. So if you can write down the generative model, you can then write down the differential equations of these, sort of, sensory inactive internal states. And internal states you can associate with neural activity. Active states can be either secretions or it could be physical movements of an arm. And then you will be able to simulate these kinds of phenomena. What you can then do is take empirical data, and then change the parameters of the generative model until your simulation of an arm movement, for example, or a neuronal response to a perceptual synthesis, matches what you observe in terms of brain signals.

1:02:26 KF: So that, yeah… In a sense, that’s another description of what we already do. That’s what neuroscientists do.

1:02:33 SC: Okay. [chuckle]

1:02:33 KF: We think about the functional anatomy in terms of, well, how does the brain model its world, how does it make these predictions, how does it generate, for example, in movement, how does it generate these motor commands. But they’re not motor commands. They’re just predictions of what I should feel if I was actually in this position, or walking, or talking. And, in fact, there’s a whole industry of not only theories of interpreting reflex arts and the motor system under that kind of perspective called the Equilibrium Point hypothesis. But there’s also massive debate about the actual implementation of that and whether that’s the right way to look at things.

1:03:13 SC: I guess I had this impression that you might imagine that what the brain is trying to do is just use Bayes’ theorem. It has some beliefs, it gets in more data, it updates its beliefs. But that is calculationally difficult, computationally intensive, and calculating free energy is a sort of shortcut. The minimizing of free energy is calculationally easier than simply conditionalizing probabilities.

1:03:37 KF: I see. Sorry, yes. You’ve moved us into a very important observation. So, we’ve been talking about beliefs about where we should be, and prior beliefs, and beliefs about the future. And, of course, prior beliefs implicitly rests upon a Bayesian distinction between prior beliefs, prior to seeing any sensory states, or sensory data, and posterior beliefs, the product of belief updating having observed those data. So that is the process of minimizing variational free energy, for example. Or is it? Not quite.

1:04:19 KF: If that process of belief updating, and this just connected back to physics… So, what we are saying is that there is a gradient flow that is in place, that underwrites a random attractor. And that that attracting set defines the kind of thing that we are, and there’s this Markov blanket or partition in play. So, that gradient flow we’re gonna now associate with belief updating on the assumption that some states and code probability distributions are standing for the parameters of beliefs. And once we’ve made that move, then we can actually write down the gradient flows in terms of belief updating, which is literally we’re taking a random dynamical system, I’ll say a [1:05:02] ____ system, and interpreting the dynamics as belief updating, and the belief updating is from priors to posteriors.

1:05:09 SC: So, gradient flow in this case is a way of saying, moving from where we are a little bit closer to where we wanna be.

1:05:16 KF: Yes.

[chuckle]

1:05:17 KF: Yeah, absolutely. Well, that’s exactly what the free energy actually scores. So, we’re covering all sorts of wonderful issues here. [chuckle]

1:05:28 SC: We haven’t gotten to the origin of life yet, but we’re gonna get there, too. [chuckle]

1:05:33 KF: Yeah. No, it was the difference between Bayes and approximate Bayesian inference, that’s where we’re going. But I think actually you’re touching on something which we’d rehearsed previously, which is another perspective on variational free energy. So, just for your interest. Once you abandon equilibrium physics, what I fondly refer to as 20th-century physics, and you just live in a world of non-equilibrium steady state… Which implies that there is some attracting set there and you have to explain it, you know, it’s mechanics… Then the free energy, the variational free energy, now starts to have a look and feel of something which is much closer to a physicist’s thermodynamic free energy. And effectively it scores the divergence between the current state of the system and the probability distribution it would have on its attracting set.

1:06:27 SC: Exactly on the attracting set. Yeah.

1:06:29 KF: So its basically, how far away am I from my attracting set, or my non-equilibrium steady state probability distribution. So in that sense, it now looks very much like the amount of energy available to do work… So, “I look at me, I perturb me, I put me in a highly unusual frightening, angst-inducing situation I’ve never been in before, homeostatically or conceptually, and I will work towards getting back to my comfort zone, my familiarity.”

1:06:58 SC: Yes, my happy place. [laughter]

1:07:00 KF: My happy place, my right temperature… And in so doing I will minimize my free energy and I’ll be working literally on the environment. So now, to my mind, there becomes an almost invisible distinction between the variational free energy, in the context of writing down the dynamics or the mechanics of non-equilibrium steady states with attracting sets and thermodynamic free energy.

1:07:23 SC: Yeah, thats good.

1:07:23 KF: And, in fact, you can just put a Boltzmann constant on and they are the same thing. And interesting that Boltzmann constant is, in the purely information theoretic perspective, it now just equips the amplitude of random fluctuations with units, and now you can start to interpret it in terms of…

1:07:44 SC: Physical measure.

1:07:44 KF: Absolutely. Physical measures. Anyway, that was me indulging myself, ’cause I know you’re a physicist, you like that sort of thing.

1:07:52 SC: No, no, no. Yeah, thank you. [1:07:52] ____.

1:07:55 KF: But back to the, you know, why not just Bayesian inference, and what’s the difference between minimizing free energy and Bayesian inference? Great questions. So if Bayesian model evidence is just simply the marginal likelihood of states of being, then one can trivially say, just by optimizing the likelihood that I am in this state, which by definition is the thing that I do because that’s how I exist, I could describe myself as performing Bayesian inference. Trivially so, because I can just call the negative log of my non-equilibrium steady state density model evidence, and therefore everything I do is in the service of maximizing model evidence. I’m the perfect basis of decision. It doesn’t get you anywhere, but it’s a nice thing to say.

[laughter]

1:08:45 KF: So, where does the variational free energy comes in? Well, the variational free energy comes in because it operationally defines what’s called approximate Bayesian inference. So what’s the difference between Bayesian influence that would conform to Bayes rule and approximate Bayesian inference, or variational Bayes or sometimes called ensemble learning but probably variational Bayes is the most technically correct term. Well, the difference is that you’re not maximizing model evidence, you are maximizing a lower bound on model evidence, which means that the thing that you can measure, which is this variational free energy or the negative in machine learning, is now always in machine learning, it’s always below the actual thing you want to maximize which is your model evidence. But it’s flipped aside and bring us back to physics and the free energy principle. So the negative logarithm of model evidence, which is essentially our self-information, our surprisal, can always be minimized as if you were a perfect Bayesian statistician by minimizing something that’s always provably larger than that. And that’s the KL divergence or bound approximation. So the variational free energy is an upper evidence bound and if you minimize that then you become an approximate Bayesian inference. Why approximate? Well, ’cause the bound is not necessarily zero.

1:10:27 SC: It’s not saturated yet. But it might be easier to work that way.

1:10:31 KF: That’s the only rationale for it, yeah. So, it’s just that the evaluation of the actual evidence, a partition function if you’re a physicist, becomes intractable in high-dimensional systems. So how do you elude that intractability when you’ve actually got real physical systems that would seem to be able to do this kind of thing? Well, you just create a tractable bound. And who did that? Well, Richard Feynman. That’s where it came from. On one reading of the legacy. There’s another Russian reading which we’ll… [laughter]

1:11:10 SC: [1:11:10] ____ [laughter]

1:11:11 KF: On one reading of the legacy of physics for things like the free energy principle and certainly machine learning. That was Feynman’s path integral formulation, which introduced the notion of a variational bound… A bound that was using variational calculus was provably always greater than the thing you wanted to optimize. So you just optimize the bound instead, and you get into some nice rhetoric about bounded rationality in economics.

1:11:38 SC: Okay, yeah. I hadn’t even thought of that, but I guess it makes sense.

1:11:40 KF: It’s not perfectly rational. It’s not exact Bayesian influence, but its doable. It’s physically realizable. So that’s the key differences.

1:11:49 SC: None of us is Laplace’s demon.

1:11:50 KF: Absolutely. Yes, yeah.

1:11:52 SC: But so much of this conversation, and I think for good reason, has been in the context of agents or brains, occasionally thermostats, but you do wanna be even a little bit more ambitious, right? And talk about sort of a general organizing principle of non-equilibrium systems to minimize their free energy, and maybe this has something to do not just with the nature of cells and organisms, but with their origin, is that safe to say? Is that a fair ambition to attribute to you? [laughter] That minimizing free energy helps explain why life came into existence in the first place?

1:12:27 KF: Oh. Yeah. You’re taking me out of my comfort zone now.

[laughter]

1:12:31 SC: Surprise.

1:12:33 KF: Did I say that somewhere?

1:12:37 SC: It could be me. It could be my lenses, ’cause I want to do that. And I may be attributing it to you.

1:12:39 KF: Oh, I see. Well, you should talk about that then.

1:12:42 SC: Yeah, Yeah, Yeah… Well, let’s put it this way. You mentioned Markov blankets, right? The idea of… In fact, we have a bunch of systems in the universe that have a pretty clear boundary between themselves and the rest of the world. And this boundary mediates their interactions with the world. The appearance of cell walls is clearly one of the most important steps in the origin of life, and a literal cell wall is certainly similar in spirit, if not exactly identical, to the conceptual Markov blanket between the inside and the outside. Is that safe to say?

1:13:22 KF: Absolutely.

1:13:22 SC: Yeah, yeah, yeah.

1:13:23 KF: In fact, that is the metaphor we always use is a cell surface is a boundary. Yeah, absolutely.

1:13:29 SC: But somehow, life as we know it, you could imagine that if life were just defined as some complex thing that had a big chemical reaction and it evolved, it wouldn’t need to have cell walls, it wouldn’t need to be in a compartment and yet it is, as a matter of fact. And I’m wondering if somehow being compartmentalized like that affords a special set of powers to certain chemical reactions to then go and adapt in a crazy world.

1:13:57 KF: Yes. I’m sure that’s right. I’m not sure about… I’m wondering whether there’s a slightly more deflationary way of expressing that.

1:14:08 SC: Very likely, yeah. [chuckle]

1:14:11 KF: In order to exist and to have measurable characteristics in the sense of a [1:14:17] ____, you’d have to have a Markov blanket. So from a weak anthropomorphic principal point of view, then it could have been no other way.

1:14:27 SC: Well, does a hurricane have a Markov blanket?

1:14:33 KF: Interesting point. I’ve been asked does Gaia. No, it doesn’t. That’s very irritating. Nor do candle flames.

1:14:39 SC: And they’re alive so that’s… But yeah, so there is some… Right.

1:14:44 KF: Oh, I see. Good point, yes.

1:14:44 SC: But they have characteristics, some characteristics…

1:14:47 KF: Do they? Yes, but on a norgodic sense. In the sense that they don’t last long enough. However, you could say an eternal flame… I think you’re right, I think that’s, from my perspective, that is a really interesting outstanding challenge. I’m wondering whether Berkoff’s notion of wondering sets resolves that, that you could actually have a mark off blanket that renews itself.

1:15:15 SC: Right.

1:15:15 KF: However that’s…

1:15:18 SC: But in some sense, one of the reasons why we don’t count hurricanes or forest fires as living is that they’re not rich with information, they’re just doing their thing. And, even the simplest cell in your language carries a model of the world around with, right, and a forest fire does not.

1:15:37 KF: Ah, good. Yes, no, that makes a lot of sense. I wish I has said that.

1:15:40 SC: Yeah, and so somehow, I don’t know how life began some of my friends are working on it. I do wonder whether or not we are under emphasizing the important… There’s a debate in the origin of life, between replication first where you imagine that RNA or the information carrying thing came first, and then it sort of wrapped a blanket around it, and got energy and got going. There’s another camp, metabolism first, that said that the actually extracting free energy in the physicist sense from the environment was the first thing that happened and only later did it get imprinted informationally on to RNA and then put in a cell. And then, everyone agrees, that the easy part is making a cell wall, it’s just like a bi-lipid layer, that can just happen automatically. So I wonder if we’re not under-selling the importance of that thing. That without that compartment, without that blanket, the very notion of having a view of the outside world having a model, even if it’s very, very primitive, is not really tenable if you don’t even have a difference between self and other.

1:16:47 KF: Yes.

1:16:47 SC: What does it mean to have a view of the other?

1:16:49 KF: Well, that’s the deflationary aspect. For me, it’s beautiful. It’s a [1:16:53] ____ desert landscape, there’s a fundamental truth to that. I haven’t really thought about this, so clearly you have…

1:16:58 SC: I have [1:17:01] ____…

1:17:03 KF: It does strike me, when we use the free energy principal to model morphogenesis and cellular organization, it becomes immediately obvious that you need RNA and DNA as a generative model. That’s certainly the case when you’re talking about multicellular organization. You cannot get a free energy minimum from coupled free energy minimizing mark off blanket without some shared generative model that is most naturally written down in terms of some genetic code. So it may well be…

1:17:37 SC: From an information perspective, that makes perfect sense. That’s what DNA is good at doing, storing the information.

1:17:43 KF: So, perhaps, you could argue, and perhaps you have already or people have already that, yes, first of all, you need a Markov blanket, otherwise there is no existence… Of the sort we are talking about. But if it is the case that just the part of having a Markov blanket, means there is a way of writing down the dynamics or the mechanics that makes it look as if there is a generative model. There would also have to be something in the internal states that plays the role of a generative model.

1:18:15 KF: And it seems quite natural that the things that endure over generations, or show that sort of attracting set with a sort of itineracy that kind of looks like reproduction. Then that’s what we call RNA or DNA. They may well be that the other side of the coin of having a Markov blanket namely, I must have an implicit generative model, or it looks as if my gradient flows are a variational gradient, the free energy gradient. And that free energy is a functional of a generative model, therefore it must be a bio-physical encoding of that. That could be a statement that you need something like RNA, DNA in order to go hand-in-hand with the Markov blanket.

1:19:04 SC: Well, I do appreciate your indulging me there, but that may be good then to sort of wrap things up. We can bring it back close to where we started with going back to the clinic out of the research environment. How does this perspective, this point of view of free energy minimization, help us understand not just the brain when it’s working, but the brain when it’s not working? I recall schizophrenia is something that you might be thinking about, but maybe other things as well.

1:19:32 KF: Yes, I mean my personal training and interest is in schizophrenia but I have to work with… That sounds awful to say. I have the pleasure of working with lots of psychologists and psychiatrists.

[laughter]

1:19:43 SC: We’ll edit that out, don’t worry.

1:19:44 KF: Who are interested in all sorts of neuropsychiatric conditions. So, I think it’s a lovely question because it allows me to make some simple points. So if Kant and Helmholtz and everybody since that time are right and effectively, psychology is inference, consciousness is inference and unconsciousness is inference, psychology is inference, and psychiatry is psychopathology. That tells you, by definition, that psychiatry is all about broken inference. False inference. And I mean that in a very literal sense, that basically inferring something is there when it is not, like a delusion or hallucination, or inferring something is not there when it is, like an agnosia or a denial that something exists, or part of my body exists. So, just extending that notion that nearly every psychiatric and probably neurological syndrome can be cast as an instance of false inference suddenly gives you… Or puts pressure on you to now derive a calculus of how the brain works that is framed in terms of beliefs and inference. So just a few examples. Well, we’ve covered, say, delusions and hallucinations in schizophrenia. Those are the…

1:21:18 SC: Classics…

1:21:20 KF: Poster childs of false inference, but also it can manifest in a slightly more subtle way like take Parkinson’s disease.

1:21:28 SC: Okay.

1:21:28 KF: Let’s come back to our ideomotor picture of why and how we move. I infer that my arm is going to be over there, or I infer that in the next 600 milliseconds, I’m gonna stand up and start walking. Now if I make a false inference because I fail to attenuate the evidence that I am not moving, I would never realize that prior belief. I will never form an intention to move. Once I’m moving, I’m fine. But if I’m not moving I cannot deny the sensory evidence that I’m stuck, immobile. And of course that’s a classic description of Parkinson’s disease. And actually, if you drill down on the things that the belief updating at the neuronal level, it actually impacts directly the neuro-transmitters and chemicals that implicated Parkinson’s disease. So if we just generalize this notion of false inference, when inference really underwrites your free will selection of what to do next in terms of predictions about what’s gonna… How you’re gonna feel yourself move and talk and think and feel, including your gut feelings, then you’ve got a way now of writing down a mechanics of psychiatry.

1:22:41 KF: So in a way that all other normative schemes do not, because it’s actually articulated in terms of probability distributions ’cause the… If you remember, it’s all about functional, It’s all about things like entropies and beliefs and relate entropies and uncertainties, beliefs about something, then you’ve now got a calculus which is fit for purpose to understand false inference, hallucinations and perceptions and the physics that gives rise to those, the physics underwrites the belief updating that leads to this false inference. So, in the past 10 years, that’s hit me time and time again, why it is so useful, if not essential, to understand sentience in a philosophical sense and the failures of sentience that we have in psychiatry. To understand those on a formal footing calls for something like the free energy principle.

1:23:40 SC: And has it led to any specific tactics for therapeutic interventions?

1:23:48 KF: Well I hope soon. So, it certainly led to… And I may be misinterpreting this, through the whether I, the things that I’m asked to review. I only speak to people that commit or subscribe to the free energy principle. I don’t see alternative, but from what I see, in terms of being asked to contribute to special issues and what I see, in terms of being asked to review for the the specialist psychiatric literature, there has been a slight paradigm shift in the past five years. It all centers on something called precision. We mentioned that very, very briefly before when we were talking about the predictive coding implementation of the variational approximate Bayesian inference via minimizing variational free energy. So I talked about the precision weighted prediction errors, so what that basically means is that you could have a mismatch between what you predicted and what you sensed. Does it really matter?

1:24:51 SC: Right.

1:24:51 KF: You know? If it’s in the dark, if I get a mismatch between if I thought I was seeing a visual angel, what I actually get is darkness. Does that really matter? Because there’s no precise visual information around. So I actually now have to see the darkness by inferring the precision, the signal-to-noise ratio, effectively.

1:25:11 SC: The error bars in some sense.

1:25:13 KF: Exactly. It is exactly the error bars. Interestingly of course, that’s 99% of the challenge for statistical analysis. It’s not measuring the group mean, it’s measuring your uncertainty about it, but that’s interesting. People like [1:25:24] ____ sort of emphasize the importance of what you’ve… What we’ve just said for psychiatry. Getting the error bars right is at the heart of good inference. If you break the capacity to get your error bars spot on, you’re gonna get all sorts of false type one and type two errors, false inferences, inferring things are there when they’re not and they’re not there when they are.

1:25:50 KF: So, that’s where the precision comes in. It’s literally this sort of, the more precise the tighter the error bars, the less precise, the more dispersed or uncertain you are about something. Or in particular, the precision of the sensory evidence at hand, for example, or the precision of my beliefs. The confidence with which I held my prior beliefs that have been used to explain these data that themselves may or may not be precise. So, but you have to estimate this precision. So, it looks as if, and this is the mini paradigm shift I was talking about, it looks as if, in Psychiatry, nearly all the phenomenology and the psychopathology, and possibly a lot of the neuro-chemistry and psychopharmacology, can be explained by failure to encode the precision, the error bars. And, that makes a lot of sense, because nearly every treatment, or certainly every pharmacological treatment in psychiatry, targets those neurotransmitters that have a modulatory effect.

1:26:50 KF: So, they don’t in themselves encode shifts in the beliefs, or expectations, or averages. They encode changes in the sensitivity to sensory input, for example. So, it looks as though those are exactly from the point of view of predictive coding, exactly the biophysical mechanisms that would encode your beliefs about the precision. So, what does that tell you? Well, it tells you first of all, where you should target your therapy. It’s likely to be in broken standard error estimators in the brain. So, where are they? And then, you know that once you know about the neuro-chemistry and the functional anatomy, and the projection systems, and the domains of beliefs that are broken… If you’ve got Anorexia Nervosa, it’s beliefs about your body. If you’ve got visual hallucinations, it might be a Lewy body disease, an organic psychosyndrome, but the same underlying mechanisms should be in place with different neurotransmitters and different possibly neuro-degenerative processes. So, it does give you a mechanistic focus, so you can start to move away from what has, if you like, haunted psychiatry for centuries, which is a purely nosological, describing, descriptive approach, to a slightly more mechanistic understanding which it’s started to. It’s pre-paradigmatic. [chuckle]

1:28:04 SC: There you go.

1:28:04 KF: I’m gonna use it all the time now. So, when you say is it net currently affecting treatment structures? No, it is pre-paradigmatic.

1:28:16 SC: Got it.

1:28:16 KF: But, I guess if you came back in 5 to 10 years, then I think you’d then see evidence of where that turn lead, in terms of possibly reinvigorating the pharmaceutical industry. I say that very practically as something you won’t know, but it worries a lot of people in my game. So, pharma have basically given up on psychiatry and developing psychiatric drugs. There’s no money to be made ’cause there’s no progress. There is no mechanistic underpinning. So, about three years ago, they basically all just pulled out.

1:28:48 SC: Wow, no idea.

1:28:50 KF: Yeah, so there is no active research at all, and this is expensive research, it is billions not millions, in drugs for schizophrenia, depression, which is a great shame and you can see why, because they don’t know what to target, because there’s no one said “Well it’s gotta be this system or that system, or that system.”

1:29:09 SC: So, we all agree it’s important, they just don’t see the direction forward.

1:29:12 KF: Yes, absolutely, yeah. And of course, they have to keep the money coming in to fund the research to make the next drug.

1:29:18 SC: But, maybe this will propose a direction forward in some area…

1:29:21 KF: That would be the final hope.

1:29:22 SC: Cross your fingers, that’s the final hope. Alright, I think this is a wonderful lesson to end on. We should all aspire to have our error bars brought in as close alignment with reality as we possibly can. Karl Friston, thanks so much for being on the podcast.

1:29:34 KF: Thank you.

[music]