I want to tell you about a problem that I have because it highlights a deep problem for the field of psychology. The problem is that every time I sit down to try to write a manuscript I end up eating Ben and Jerry's instead. I sit down and then a voice comes into my head and it says, "How about Ben and Jerry's? You deserve it. You've been working hard for almost ten minutes now." Before I know it, I'm on the way out the door.

This is a problem for psychology not, regrettably, because I was writing anything terribly important, but rather because it highlights a deep tension in a dual process theory of the mind. From one perspective my desire for Ben and Jerry's is the product of automatic or intuitive responses—literally gut feelings in this case—and then it's a controlled, effortful, deliberative process that tries to focus on the paper and put thoughts of Ben and Jerry's out of mind. On the other hand, it would truly be bizarre to say that when I went to Ben and Jerry's it was an automatic response. I mean, I have to go through a process of goal-oriented planning. I've got to get my shoes on, I've got to get out the door. There's a mismatch between the willpower perspective and the goal orientation perspective.

No, the exciting part for me is that I feel like we're finally able to at least frame the questions in the right way such that an answer's in the offing. The questions are being framed these days, I think, by some really foundational work that went on in computer science when people tried to design artificial intelligence systems that could learn and decide and they drew a division between two broad classes of solution. One of them which has been worked out best and is very familiar to psychologists is a kind of stimulus response learning based on reinforcement history.

I want you to imagine a rat that's in a Skinner box. It stumbles on a lever once, it notices that a food pellet comes out, and so it forms an association between the environmental context that it's in—being in the Skinner box and pushing that lever—and the association between that action and value. It says: whenever I'm in this box, the valuable thing for me to do—the stimulus is "I'm in the box", the motor response is "I'm going to push the lever", that's a valuable thing.

Remarkably, you might think that the association that the rat would form would be between pushing the lever and getting food, the outcome of its action, but it turns out not to be the case at least some of the time.

You run a procedure called the Devaluation Procedure. You put the rat in the box, it forms the association between pushing the lever and value, and then you take it out of the box and you give it unrelated access to food pellets until it wouldn't touch a food pellet with a ten-foot pole, it's completely stuffed. Then you put it back in the box and under the right conditions, it waddles over to the lever and pushes it and the food comes out and it just lets the food sit there. You know that the rat wasn't pushing the lever because it had a goal in its mind and it associated pushing the lever with a particular outcome, rather, it's a stimulus response association. I'm in the box, pushing the lever is good.

What the computer scientists were able to do was to formalize mathematically the kinds of representations that support that kind of learning, and then in a way that I won't have time to describe right now they showed how you could string together series of stimulus response associations to make local decisions that have long-run consequences that are good. Okay?

It turns out that once the computer scientists formalize this stuff and they had the equations that specified, you know, there'd be a parameter here, there's an alpha, there's a gamma, and then you go look in the brain while people are making decisions in these types of tasks, you find that if you ask, "Are there voxels in the brain? Are there regions of the brain whose response profile tracks those precise mathematical parameters?" They do, again and again and again; it's happened hundreds of time now, mostly in the basal ganglia, which is a brain region we know, for instance, is impaired in Parkinson's. If you think about what goes wrong in an individual with Parkinson's, they're not able to produce motor actions. So this is the part of the brain that's responding to stimuli and it's producing motor actions, and when it's disrupted you can become, literally, put in a kind of frozen state.

We understand that system pretty well, but it's obvious that tons of human behavior is exactly the opposite of that, it is goal-oriented. And, in fact, the rats do this, too. You can have other conditions in which you put the fat rat back in the box and it doesn't touch the lever, if you run the experiment appropriately, because now it's operating in a kind of goal-oriented planning mode. It has a particular outcome in mind that it wants to achieve, in this case the outcome is not food, maybe it's just sitting and digesting, and then it selects the actions that are appropriate to get towards that goal. And computer scientists have been able to formalize algorithms that do this type of goal-oriented planning as well.

We know that humans use goals but how do you get goals off the ground? How do you get this planning process off the ground and make it computationally tractable?

The problem with these algorithms is that if you make a task even moderately complex they totally fail because of the computational intractability of planning. We brought up the case of chess earlier, right? Chess is a game where there's many, many opening moves and then there's many, many next moves and then there's many moves after that. It's perfectly obvious what the goal is—you want to get to checkmate, but there are so many possible paths that you could take that you could have all the time in the world and that wouldn't be enough time to evaluate each one of those paths independently.

This is a really deep problem for computer science and also for psychology. We know that humans use goals but how do you get goals off the ground? How do you get this planning process off the ground and make it computationally tractable?

There's a couple of solutions that people have focused on to try to do this. One of the solutions is to arrange your goals hierarchically. I'm going to use a metaphor here, I hope it's a helptful one. Imagine that we took a picture of this group and we turned it into a jigsaw puzzle. You can suppose one way to try to solve the jigsaw puzzle would be to randomly arrange the pieces one by one, as if we were taking random searches down that chess path, right? You'd have to go through billions of random arrangements before you ever alighted on the appropriate one. But if you organize the puzzle hierarchically, you could reduce that search space a bit.

You'd say, "I'm going to just focus on Josh. I'm going to just take the pieces that look like they plausibly belong to the Josh area, fit them together, and then the June area, and the Rob area. Then once I've gotten those units organized, then I can shuffle high level units." You don't have to try every combination anymore because you're working on little local problems and then moving big chunks of space around altogether.

That's a good start but it's not going to get you the whole way. Even just a simple task like making a sandwich; I can say in order to make a sandwich I'm going to have to have some kind of sub-goal, there's going to have to be a first step, but there's kind of an infinitude of next steps that I could take. One of the next steps that I could take would be to get the bread out of the refrigerator, but another next step that I could take would be to start the manuscript that I have to start, or to pick up my wife at the train station, or any number of an infinite number of sub-goals that a person could possibly entertain.

We're going to need some kind of a cognitive system that, in the state of having a goal, selects the appropriate cognitive action of selecting a sub-goal. An insight that occurred about a decade ago in psychology was that you could actually maybe use the basal ganglia—that old rat-like stimulus response learner—to solve that problem, too.

Here's how it works. When we talk about the rat, we talk about the rat being in the perceptual state of a Skinner box and having learned the motor action of pressing the lever. But by analogy you could say that the rat would be in the internal perceptual state, the kind of conceptual state of having the goal of, say, making a sandwich (it's a very smart rat) and then what the basal ganglia has to learn is a particular cognitive action rather than a motor action, which is the appropriate cognitive action in that conceptual state.

You could learn that when you're in the conceptual state of wanting to make a sandwich, the next cognitive action to take is to select the sub-goal of getting bread and load it into the goal unit. Then if you're in the cognitive state of having getting bread in working memory, in your "goal slot," then the next cognitive action that you're going to take is walking over to the refrigerator. This is a way of using the kind of simple machinery that we understand of stimulus-response learning and getting it to perform the individual computations necessary to do a much more sophisticated type of goal-oriented planning and action selection.

One of the remarkable things is when you go back to the Parkinson's patients who are not able to produce motor actions and you ask them what it's like to be in that state of being sort of frozen in this motor sense, they'll talk about feeling cognitively frozen, too, as if their thoughts are moving with incredible slowness or they can't bring the thought to mind that they want to bring to mind. It's quite different than other types of motor impairments. ALS deprives someone of the capacity to produce a motor response but their thoughts are moving just as fast as they ever were. And so, a bunch of research has now shown that that region of basal ganglia is interacting with working memory in order to facilitate the movement of information in and out of working memory.

There are a few things that I find exciting about this. One of the markers of progress in psychology is when you can exorcize just one of the ghosts from the machine—when you can take one of those points in science where you had to say, "And then a miracle happens." We knew we had to make a sandwich so the obvious sub-goal was get bread, but that's a little bit of and then a miracle happens. How did you actually get from Point A to Point B? And this starts to show us the way that you can do that.

The second thing that I like about it is that it teaches me something about why it is that I keep ending up at Ben and Jerry's? The idea is that my basal ganglia has learned that when it loads the sub-goal "get ice cream" into working memory, good things happen. When I'm trying to work on my paper, what's happening is that this basal ganglia that really loves ice cream keeps saying, "Oh, you know what would make me happy? What if you had the goal of then going and getting some ice cream?" What that suggests (this is the third thing that I love about this area of research) is a new way of thinking about what automatic and controlled processes are and how they relate to each other.

For a long time we've talked about automatic and controlled processes as if they were systems, as if we were going to go into the brain, perhaps, and actually find completely dissociable mechanisms. One of them does the dumb stimulus response learning thing, another one of them is going to do the smart goal-oriented planning thing. What this suggests is that the controlled processes are really just a kind of adjunct—they're an add-on, an optional feature, like an app, that you can run on the lower level stimulus response system. Specifically what they do is they take a system that probably evolved to mediate between perceptual states and motor actions, and then turned it inwards and allowed it to operate on conceptual states and cognitive actions.

This feels deeply true to my intuitions about how controlled cognition works. Let's take a classic example of a controlled cognitive process, say, doing a calculus problem. Well, maybe let's make it simpler, let's just make it long division. If I think through long division, the individual cognitive operations that I perform each seem to bottom out in a kind of a habit or an intuition. Like, eventually I'm just going to have to say that five minus three is two. It's not as if when I get to five minus three is two I'm still doing something controlled. When I get to five minus three is two I'm in a conceptual state—five minus three—and then almost habitually two just pops out as the answer to that question.

If you think back to when you were in kindergarten, you actually had to learn that cognitive habit the same way that the rat has to learn about pressing the lever and getting a reward. That is, the teacher ran laborious drills on your times tables and on the shortcuts of division and the specific operations that you would have to perform at each step in the process of doing long division, right? These are the things that excite me the most about this area of research and reinforcement learning and the ways that it connects with neurobiology.

I said at the beginning that I regard this research as a kind of a promissory note—that where we are right now is that we're starting to be able to frame the questions in the right way but that a lot of the answers are still elusive. It might seem as if maybe I'm overselling the research because I've been presenting some of these ideas as if we had all the solutions worked out, and I want to start by saying, well, we don't. People have a hunch that it's going to work something like this but getting all the nuts and bolts to fit in place is still to be done. But there's also a much, much deeper question which I think has largely been ignored and is one of the areas that I'm excited to move into over the next couple of years.

...humans as a species stand apart from all other species in our ability to engage in controlled cognition, to organize our lives around very distant goals, to inhibit more automatic responses and hold and use flexibly information in working memory. We should ask ourselves what is it that enables humans to do that? What are the other things that are unique and special about humans that might explain why we're able to engage in controlled cognition?

What's been ignored is the learning problem of getting the right cognitive stimulus response habits. What do I mean by that? When we plan very complex things, like John planned this lovely weekend for us, and there are a lot of pieces that had to come together, from the film crew to the specific set of speakers, to getting these tablecloths and the tables. When we're putting together something as complex as that, is it really plausible to say that John himself learned each of the appropriate cognitive stimulus response habits such that all of these things that have never interacted in this way before would fall into place perfectly? John in his lifetime just hasn't had enough experience…well, I shouldn't say this about you in particular. I suppose that even some of the younger of us around the table could have put together something like this but we wouldn't have benefited from the experience that John has had through his lifetime putting together many such events.

What's going to be the source? Where would the knowledge come from that we can then flexibly assemble into novel plans and novel procedures? A hint towards the solution to this problem comes from observing that humans as a species stand apart from all other species in our ability to engage in controlled cognition, to organize our lives around very distant goals, to inhibit more automatic responses and hold and use flexibly information in working memory. We should ask ourselves what is it that enables humans to do that? What are the other things that are unique and special about humans that might explain why we're able to engage in controlled cognition?

The one other thing that stands out to my mind as being incredibly unique about humans is culture and is social interaction. One obvious source comes back to the point that I made about how we learn to do long division in school. One place that we could learn the appropriate sets of cognitive stimulus response actions—the ways to manipulate information internal held in working memory—would be through the scaffolding of other people teaching us and other people showing us that if you want to get a Ph.D. you're going to have to publish a certain number of papers, and in order to publish the papers you're going to have to run the experiments and you're going to have to be able to run a t-test, so you'd better go to statistics class, right? We can't learn all of that stuff through trial and error. The first time I tried to get a Ph.D. I didn't go to statistics class, so let's try something new this time. Right? That knowledge comes to us through cultural channels.

In the literature right now there's a debate between two rival theories for what makes humans unique. One theory calls itself the "cognitive niche" and it basically says what makes us unique is that we can think very, very carefully and hard about things in a controlled way. Another hypothesis calls itself the "cultural niche", and it says, no, what makes us unique is that we get for free the answers to problems culturally. Other people have worked it out through trial and error and they tell us.

What I find really exciting is the idea that it's not just that both of those things are true but that they're codependent. That in principle you could not make the mathematics of controlled cognition work, you couldn't solve the computational intractability without the support of cultural input, and that cultural knowledge wouldn't be much good if you couldn't flexibly reassemble it in the way that hierarchical representations allow you to.

That's a promissory note, we haven't done the research yet. I haven't, certainly, and I certainly don't think the field has either, but that's what I see as being really exciting right now. And in a way I think if we were able to use some of these ideas to address the problem of what it is that makes humans so different than other organisms it would be rather more exciting than just having figured out why it is that I keep eating ice cream instead of writing papers.

KAHNEMAN: How does your treatment relate to the Newell and Simon idea that basically you solve problems by working backwards from where you want to go? If I want to go there, I must get here first and ...

CUSHMAN: There's a lot of virtue to thinking about working backwards. Introspectively, all of us do this a lot. But you still have the same computational intractability problem. If where I want to be is having a Ph.D. and there isn't some trick, there isn't some ghost in the machine that's going to help direct my attention to of all the things that could directly precede my having a Ph.D., the one which is relevant, which is having written at least three papers, then I've got an enormous search problem on my hands. You see my point is you could say, "Oh, I want a Ph.D. What's going to be important? Let me start with the A's. Having aardvarks; alpanaering…" You've got an enormous search problem, you've got to constrain that space somehow. There's got to be some part of the brain that is able to direct your attention to the correct response, given the sort of cognitive state that you're in, whether you're moving forwards or backwards. Another way of thinking about it is: could we go to the programmers of Deep Blue who are trying to design a computer that plays chess very well and say, "Oh, you guys, a couple of decades of work, just go backwards. Start with checkmate, work backwards." Right? But it's not that easy.

KAHNEMAN: You would know that people who get a Ph.D. have something in common. That is something in your experience and you get that culturally. So that would very considerably narrow your search space.



You know, people who get their Ph.Ds., if you look backwards they all took prelims, they all wrote a thesis. There's actually a lot that you know from the fact that somebody got a Ph.D. that allows you to go backwards.

CUSHMAN: That's right. And there are two important conclusions to draw from that observation. One is that it's certainly not the case that having learned it culturally or in school would be the only answer to how we could narrow that search space and find an appropriate solution, and I wouldn't suggest that it is. But the second point is that, suppose that by hook or by crook one did know that the appropriate step before getting the Ph.D. was having three papers, what brain mechanisms would one then use to codify that knowledge and allow it to be used when next planning to get a Ph.D.? The Ph.D. case is tough, one only does that once, hopefully, you know, but making a sandwich is something you're going to do everyday; you want to cache that knowledge somewhere.

Gary Marcus has this wonderful title for a book of his: Kluge. The basic idea is the brain has a bunch of pieces sitting around and you're just going to kluge one of the pieces you've got, and it looks like at least one of the kluged solutions to where you would cache that knowledge is in the basal ganglia through this kind of analogy between the perceptual motor linkages and conceptual cognitive linkages. Almost certainly not the only way that people cache out that type of information but one of the ways that we've begun to understand and that I find exciting.

KAHNEMAN: There is direct evidence that the basal ganglia are involved in that?

CUSHMAN: Absolutely. For instance, my colleague, Michael Frank has this very beautiful work where he takes Parkinson's patients and manipulates whether they're on or off L-dopa and then looks at the impact that that has on their ability to use working memory representations, or looks at people with different genetic variants that impact dopamine function in the basal ganglia and, again, shows systematic effects on working memory. Where I'm pushing a little bit beyond—I say this as a warning, not as a self-congratulation—the sort of state of the art in the field is in suggesting that one of the most critical functions of those working memory representations may be to solve the problem of hierarchical goal planning. He's used N-back tasks, you know, very standard measures of working memory but ones that don't necessarily involve hierarchical representation. But folks like Matt Botvinick and David Badre have been starting to take those models and say, "Ah, these look like just what we need in order to understand hierarchically embedded goals." Who knows, it may go flop, but I think it's got a lot of promise.

GRUBER: I love this stuff and I particularly love the cultural angle but it strikes me that there's some empirical stuff that we might not know yet, or maybe you know if we know yet, which is the extent to which your social inputs and other people's social rewards can feed into these reinforcement learning models, right?

CUSHMAN: Yes.

GRUBER: There's one question about whether I can watch other people getting rewards and whether I can structure from that my own set of plans—my own kind of perceptual motor kinds of schemes. Because ultimately if that social input's going to get in there, it strikes me that from your argument it has to enter in the kind of dumb rat level as opposed to the kind of goal level. Can you get reward prediction error from other people's rewards? Are people looking at this stuff yet?

CUSHMAN: Gosh, I'm embarrassed to say that I don't know right off the top of my head for reward prediction errors. The best work that I know on this topic is by Liz Phelps, and a lot of her work focuses on aversive or fear conditioning rather than (she has some stuff on reward, too), but fear has been the mainstay of her research. She has some work showing that the amygdala, which seems to play a somewhat analogous function in these sort of conditioning processes in the fear domain, responds equivalently. In one experiment you're hooked up to a shock machine and a computer shows you different shapes, whenever you see a blue square you get a shock, your amygdala starts to activate, whenever it sees the blue square even absent the shock because it's formed this predictive association.

She finds that you can show somebody a video of somebody else participating in the experiment and then if you show them a blue square on the scanner, you get the same amygdala response, even though they've never experienced it themselves, they've only seen it through the video. And she additionally finds that you can get a similar response just by telling them "Hey, you know what about blue squares? They predict shock." But the interesting thing is that that, unlike when you observe it directly, there you get a bilateral amygdala response. When you learn it verbally, you only get a left amygdala response. And of course a tantalizing possibility is that that language is left localized.

KNOBE: I really liked your example about the rat, that even when the rat doesn't want food at all it will still... According to the theory that you've been developing it should be that that same thing with the rat is exhibiting its behavior, we should be exhibiting cognitively. That even though we don't exhibit the behavior actually, when we're just trying to figure out what should be my sub-goal, we're going to show that exact error?

CUSHMAN: Yes.

KNOBE: Do people actually do that?

CUSHMAN: You've drawn me into inside baseball. But yes, we do and the best example of it is actually in the moral domain. We consider it worse to harm somebody as a means to an end than as a side effect of our behavior. For instance, the Catholic Church has a very peculiar version of this doctrine in which if you are pregnant and your fetus presents a threat to the pregnancy, then it's impermissible to terminate the pregnancy in order to save yourself because you're killing the child as a means to saving yourself. The death of the child is the sub-goal, right? It's like, what's threatening me? Child. Then the sub-goal is: kill the child in order to avoid the threat.

However, if you're pregnant and you develop uterine cancer, the only way that you can save yourself is to have a hysterectomy, but as a side effect of the hysterectomy of course the pregnancy will terminate—that's permissible. And notice the one critical difference between the two cases is that in the hysterectomy case there's no sub-goal. You didn't say, "Oh, my sub-goal, in order to achieve the goal of saving myself, has to be to kill the child."

People draw a distinction between these two things. I hope you can see the way that this connects with some of the ideas that I was describing before. If you had a system that assigned values to sub-goals, then that system when it looked at the sub-goal—kill a child—I mean, of all the things to assign a negative value to, that would be very high on the list, right? You'd get a big response out of not wanting to do that. But when it occupies the role of side effect, a system that assigns values to sub-goals would miss it.

What's interesting about that case, and, again, it comes back to the challenge of working out a kind of dual process view of the mind. Usually when we think about goal/sub-goal hierarchies we think we're in the part of the mind that's fully controlled, that has promiscuous access to all knowledge in the brain, but a system that was fully controlled that had promiscuous access to all knowledge in the brain would focus on the fact that the baby is equally dead in both cases. Right? You have to understand: why is a sub-goal representation really critically important and yet important in a blind way—important in a way that can't put together all the consequences in a kind of forward planning sense of an action that we take?

If you think that there's this sort of peculiar marriage between a relatively more dumb system that just does the stimulus response stuff and places values on actions, including sub-goals, and then the process of goal planning itself ... I feel like you might be able to have your cake and eat it, too.

MULLAINATHAN: Danny's question got me worried because I found your chess metaphor very appealing. Then the question about working backwards made me realize it's actually ... I'm worried we're over-applying the math implicit in that metaphor. So, picture the chess tree is actually a tree that's exploding out. And when you say we want to reach checkmate ...

CUSHMAN: There are many instantiations ...

MULLAINATHAN: On the other hand, there are many mathematical problems like the puzzle example you gave, that exactly does not fit that. There is exactly one instantiation by which the puzzle was assembled. The idea that there's this many is kind of an illusion. There is this thing and of course, then working backwards makes total sense, which I think is what you were getting at with your PhD example.

CUSHMAN: Yes.

MULLAINATHAN: And is that just a superficial problem or is that a more basic distinction between these? I know in the actual programming literature these are two very different kinds of things and everything in between happens. But is that notion that there are tasks where there is only one thing you're trying to get to, so you can work backwards versus a task where so many things can lead there and the end goal is… is that distinction important? Is that something people thought about? Because the math would be quite different.

CUSHMAN: The wonderful thing about having started this discussion by saying, "I'm going to present questions that I find exciting rather than answers" is that I can now say I just don't know. I think it's a perfect question and I don't know the answer.

MULLAINATHAN: I love the long division thing and the idea that you had to understand the rote learning thing ... but yet, and maybe this is just an illusion, but it feels to me that some amounts of my almost automatic cognitive responses, it's not just that they were never learned by me as you were getting at. It's also they were never rehearsed in this way. Even if they were learned by Danny [Kahneman] who told me, it's not that he didn't tell me and tell me again and again. I didn't in my mind rehearse the thing again, and again, and again until I got it. It was almost like I got that module, sometimes from someone else, sometimes just by figuring it out and saying, "Oh, this is the right thing to do." But once I had it, it was almost like I could pop it into a system.

CUSHMAN: Yes.

MULLAINATHAN: And it felt like it pops into a system and maybe what we'll discover is it doesn't pop in in the same way as something that was learned but at least by my own intuition it feels as automatic as five minus three is two. Do you see what I'm getting at?

CUSHMAN: I do. One of the areas of research where people have really investigated that pop-in effect is what gets called fictive rewards. Read Montague is one of the leaders in this area of research. The idea is in the simplest version of making a person into a rat you just give them a bunch of levers they can pull and they get rewards. If they pull Lever A they get the reward from Lever A, and over time they learn which levers are good.

But a somewhat more complicated version that reflects the way that humans sometimes learn about things is that when you pull Lever A you get Reward B, but then it's revealed to you what reward you would have obtained had you pulled Lever B or Lever C.

Behaviorally you observe that people use that information. Quite sensibly, they should. It turns out that when you look in the brain and you try to look at neurosystems that seem to be responding to those moments of revelation, it's the very same mechanisms that learn from direct experience.

What the prefrontal cortex seems to supply is a fictive reward that the basal ganglia then treats as if it had been a veridical reward, so that the next time you have to choose one of these three levers the basal ganglia itself can evaluate their relative values. I really don't want to give the impression that that is the complete answer as to how humans behave in gambling tasks, it's certainly not. But the beautiful thing about the basal ganglia is we've learned an awful lot about how it works, so at least with respect to that system we know that there's a way that you can take a verbal representation and somehow create the type of input to the system that ordinarily a reward occupies, even though no reward was experienced.

MULLAINATHAN: It's like you had said: exorcising the ghost from the system. It feels like, to me, that particular ghost would be very interesting. Because when you start talking about conditioning, I thought, "Wow, this is really interesting, we're going back to this ... " Now we have this ghost creeping back, it that feels totally different and interesting. And I'm not saying that we haven't made progress. It feels like that understanding that ghost would be a particularly high return for understanding your findings.

CUSHMAN: Some of the most exciting stuff that's happening now working on that problem—I'm so embarrassed, I've forgotten the researchers who are responsible for this work—but a team has used optogenetic techniques to be able to selectively active neurons within basal ganglia that respond to reward. Now they can be the ghost. That is, they can direct the response of these neurons and then observe the subsequent impact on behavior where the rats prefer Lever C because just at the right moment in time they'd used the Lever C neuron. But that's a long way from answering how one's own prefrontal cortex does it.



Back to: HeadCon '13: WHAT'S NEW IN SOCIAL SCIENCE?