Is the human brain just a rag-bag of different tricks and stratagems, slowly accumulated over evolutionary time? For many years, I thought the answer to this question was most probably ‘yes’. Sure, brains were fantastic organs for adaptive success. But the idea that there might be just a few core principles whose operation lay at the heart of much neural processing was not one that had made it on to my personal hit-list. Seminal work on Artificial Neural Networks had shown great promise. But it led not to a new and unifying vision of the brain so much as a plethora of cool engineering solutions to specific problems and puzzles. Meantime, the sciences of the mind were looking increasingly outwards, making huge strides in understanding how bodily form, action, and the canny use of environmental structures were co-operating with neural processes. This was the revolution summarily dubbed ‘embodied cognition’.

My personal grail, though, was always something rather more systematic: a principled science of the embodied mind. I think we may now be glimpsing the shape of that science. It will be a science built around an emerging vision of the brain as a guessing engine – a multi-layer probabilistic prediction machine. This is an idea that, in one form or another, has been around for a long time. But exciting new developments are taking this vision to some brand-new places. In this short post, I highlight a few of those places. First though, what’s the basic vision of the predictive brain?

A prediction machine of the relevant stripe is a multi-layer neural network that uses rich downwards (and sideways) connectivity to try to perform a superficially simple, yet hugely empowering, task. That task is the ongoing prediction of its own evolving flows of sensory stimulation. When you see that steaming coffee-cup on the desk in front of you, your perceptual experience reflects the multi-level neural guess that best reduces visual prediction errors. To visually perceive the scene in front of you, your brain attempts to predict the scene in front of you, allowing the ensuing error signals to refine its guessing until a kind of equilibrium is achieved.

Such an architecture makes full use of the huge amounts of downwards and recurrent connectivity that characterize advanced biological brains. This is important since the bulk of our actual neural connectivity is recurrent, involving loops in which information flows downwards and sideways. So much so that the AI pioneer Patrick Winston wrote, in a 2012 paper, that “Everything is all mixed up, with information flowing bottom to top and top to bottom and sideways too. It is a strange architecture about which we are nearly clueless”.

One key role of all that looping connectivity, it now seems, is to try to predict the streams of sensory stimulation before they arrive. Systems like that are most strongly impacted by sensed deviations from their predicted sensory states. It is these deviations from predicted states (known as prediction errors) that now bear much of the information-processing burden, informing us of what is salient and newsworthy within the dense sensory barrage.

Systems like this are already deep in the business of understanding. To perceive a hockey game using multi-level prediction machinery is to be able to predict distinctive sensory patterns as the play unfolds. And the more experience has taught you about the game and the teams, the better those predictions will be. What we quite literally see, as we watch a game, is here constantly informed and structured by what we know and what we are thus already busy (consciously and non-consciously) expecting.

This, as has recently been pointed out in a New York Times piece by Lisa Feldman Barrett, has real social and political implications. You might really seem to see your beloved but recently deceased pet start to enter the room, when the curtain moves in just the right way. The police officer might likewise really seem to see the outline of that gun in the hands of the unarmed, cellphone-wielding suspect. In such cases, the full swathe of good sensory evidence should soon turn the tables – but that might be too late for the unwitting suspect.

On the brighter side, a system that has learnt to predict and expect its own evolving flows of sensory activity in this way is one that is already positioned to imagine its world. For the self-same prediction machinery can also be run ‘offline’, generating the kinds of neuronal activity that would be expected (predicted) in some imaginary situation. Sometimes, however, the delicate balances between top-down prediction and the use of incoming sensory evidence are disturbed, and our grip on the world loosens in remarkable ways. Thinking about perception as tied intimately to multi-level prediction is thus also delivering new ways to think about the emergence of delusions, hallucinations, and psychoses, as well as the effects of various drugs, and the distinctive profiles of non-neurotypical (for example, autistic) agents.

The most tantalizing (but least developed) aspect of the emerging framework concerns the origins of conscious experience itself. To creep up on this suppose we ask: what might it take to build a sentient robot? By that I mean: what might it take to build a robot that begins to have some sense of itself as a material being, with its own concerns, encountering a structured and meaningful world?

A growing body of work by Professor Anil Seth (University of Sussex) and others may – and I say this with all due caution and trepidation – be suggesting a clue. That work involves the stream of interoceptive information specifying the physiological state of the body – the state of the gut and viscera, blood sugar levels, temperature, and much much more (Bud Craig’s recent book How Do You Feel offers a wonderfully rich account of this).

What happens when a multi-level prediction engine crunches all that interoceptive information together with information specifying structure in the external world? Our multi-layered predictive grip on the external world is then superimposed upon another multi-layered predictive grip – a grip on the changing physiological state of our own body. And predictions along each of these dimensions will constantly interact with predictions along the other. To take a very simple case, the sight of water, when we are thirsty, should incline us to predict drinking in ways that the sight of water otherwise need not. Our predictive grip upon the external world thus becomes inflected, at every level, by an accompanying grip upon ‘how things are (physiologically) with us’. Might this be part of what enables a robot, animal, or machine to start to experience a low-grade sense of being-in-the-world? Such a system has, in some intuitive sense, a simple grip not just on the world, but on the world ‘as it matters, right here, right now, for the embodied being that is you’. Agents like that experience a structured and – dare I say it – meaningful world, a world where each perceptual moment presents salient affordances for action, permeated by a subtle sense of our own present and unfolding bodily states. A recipe for Sentient Robotics 101.beta perhaps?

There is much that I’ve left out from this post – most importantly, the crucial role of self-estimated sensory uncertainty (‘precision’), and the role of action and environmental structuring in altering the predictive tasks confronting the brain: changing what we need to predict, and when, in order to get things done. That’s where these stories score major points by dovetailing very neatly with large bodies of work in embodied cognition.

Nor have I mentioned the many outstanding problems and puzzles. For example, it is not known whether multi-level prediction machinery characterizes all, or even most, aspects of the neural economy. Most importantly of all, perhaps, it is not yet clear how best to factor human motivation into the overall story. Are human motivations (e.g. for play, novelty, and pleasure) best understood as disguised predictions – deep-seated expectations that there will be play, novelty, and pleasure? That is a challenging vision, but one that could offer a deeply unifying perspective indeed.

Feature Image: “I love water,” by Derek Gavey. CC-BY-2.0 via Flickr.