A friend recently told me about a self-help tactic that has become popular in the circles I move in: the idea of applying behaviorism to yourself (sometimes called “training your inner pigeon”). The idea is you give yourself rewards when you do things you want to do more of, and your brain works its magic and reinforces the activity.

When I first heard about this, my thought was “No way that is ever going to work”. I have always been under the impression that conditioning is kind of like tickling. You can’t tickle yourself. You’d be expecting it.

Let’s start by distinguishing a couple of possibilities:

1) This process doesn’t work at all

2) This process works by making you want the reward. Suppose you promise yourself a candy bar each time you do homework. You are hungry and want the candy bar, but you would feel bad if you ate it without doing homework. Therefore, you grudgingly do homework to get at the candy bar.

3) This process works by changing your urges and desires. After eating a candy bar each time you do homework, your brain associates homework with a nice, delicious-feeling, and you enjoy doing homework more from now on.

Let’s start with 3, the most encouraging possibility. This gains a little support from the Little Albert experiment. Here, a baby who had no particular fear of rats was exposed several times to rats plus loud, terrifying noises. Eventually the baby came to fear rats, even without the noise, presumably because the fear of the noise had generalized onto the rat through association. It’s easy to see how this could mean something like the happiness of candy-bar-eating generalizing to homework. Nevertheless, I believe this argument proves too much.

Every evening, I sit down at the table, get a plate and some silverware, and eat dinner. It’s usually something I really like, and it usually includes dessert, which I like even more. If eating good food isn’t rewarding, I don’t know what is, and sure enough I rarely skip dinnertime.

However, if for some reason I don’t have dinner – maybe I’ve promised my friends I’ll go out with a late dinner for them and so I can’t stuff myself first – I do not feel the slightest urge to sit down at the dinner table with a plate and sort of move my silverware around in the air making little eating motions, and when I tried it (empiricism!) I did not find it at all pleasant.

Take a second to think about how weird that is (the result, not me trying the experiment). Sitting at the table and moving my silverware, in conditions exactly like these, has been quickly associated with reward every single time I’ve done it in the past, for decades, ever since I learned to feed myself. But I don’t feel even a little bit of urge to do this. None at all. You may generate additional examples at your leisure, but the point is that just being consistently associated with a positive reinforcer in a low time-delay way does not make a neutral activity (let alone an actively unpleasant activity) become desirable.

What happened with Little Albert, then? First of all, he was classical conditioning and not operant conditioning. Second of all, Albert had no understanding or control over what was going on. Each time he heard the noise, he was very surprised – he was receiving a new fact from the Universe. But it wasn’t information he understood; he had no idea what the connection between the rat and the noise was and whether it would recur. He just knew that there was some mysterious rat -> noise connection.

Compare this to me eating dinner. The connection between sitting down and eating dinner is not at all a new fact fed me by the Universe; it’s something I plan myself. And it is not mysterious whether any given sitting and silverware-waving will reward me; I know it will reward me if and only if I am planning to eat dinner. Therefore the brain does not think of silverware-waving as an activity that might, who knows, lead to reward in the future.

(one might object that my inner pigeon – or lizard brain, to mix animal metaphors – doesn’t share my complex explicit knowledge of the reward structure of dinner-eating. But the little I know of the brain’s reinforcement mechanism suggests that reinforcement learning is based on surprise – technically the difference between predicted and observed values of some complicated Bayesian equation encoded in dopaminergic neurons or something – and that this system is actually quite good at predicting expected reward from an action, within certain limits)

So (3), the hypothesis that the reward will cause me to start enjoying homework, seems wrong. What about (2) – “I don’t like homework much, but at least I get some candy out of it”?

Here there’s a ceiling on how much the candy can reinforce your homework-doing behavior, and that ceiling is how much you like candy.

Suppose you have a big box of candy in the fridge. If you haven’t eaten it all already, that suggests your desire for candy isn’t even enough to reinforce the action of going to the fridge, getting a candy bar, and eating it, let alone the much more complicated task of doing homework. Yes, maybe there are good reasons why you don’t eat the candy – for example, you’re afraid of getting fat. But these issues don’t go away when you use the candy as a reward for homework completion. However little you want the candy bar you were barely even willing to take out of the fridge, that’s how much it’s motivating your homework.

Maybe you say “I will allow myself exactly one candy bar a day, but only if I finish my homework”. Even if you can stick to this rule, here the candy bar becomes an extrinsic reward motivating the homework. We all know what happens with extrinsic rewards – overjustification effect! You gradually start interpreting the task at hand as an annoying impediment to getting the reward, lose your intrinsic motivation, and as soon as the reward is removed, you’re even less willing to do the task than before.

So both (2) and (3) are pretty unlikely. That leaves us with (1) – don’t even bother.

Luckily, my friend helpfully clarified that this wasn’t what her class taught at all (I think maybe they originally tried this, but considerations like the ones I mentioned convinced them to change?). Their new policy is that you should reinforce yourself with a “victory gesture” – for example, pumping your fist and shouting “YEAH!” and visualizing an image corresponding to your success and trying to feel really good about yourself.

So for example, as soon as you sit down to start your homework, you make the victory gesture and imagine yourself graduating summa cum laude from school, and then you feel really good and have reinforced the behavior of sitting down to do your homework. And maybe you do it again when you finish, because peak end rule.

She claims a few benefits of this method. First, it’s very fast, so you can reinforce things right as they happen instead of with time delay which gives your brain enough time to lose the connection. Second, it’s intrinsic, so it’s not going to sap your natural motivation the same way the candy bar might.

I understand the claim that rewards delivered very immediately after a stimulus can work better for conditioning – I was referred to a couple of papers proving this, though I don’t remember them. But I notice I am confused. When we have good examples of real conditioning, immediate reward isn’t especially important. For example, people often use the language of behaviorism to talk about addiction, say alcoholism. But the chemical rewards of getting drunk don’t manifest until a little while after you’ve had your first beer – certainly not within a split second – and certainly alcoholism can reinforce even longer term behaviors, like leaving home and going to the bar. Pornography is another good example of effective behaviorism, but going to a porn site gives only delayed rewards – first you have to find a video you like, then you have to wait for it to buffer, then you have to sit through the boring part where the nice lady and the plumber are discussing the best ways to fix her faulty pipes, and so on. It seems that when we have a real effect that definitely works, immediacy is not required (indeed, if it were humans would have a lot of trouble learning anything but the most basic reflexes).

But okay. Ignore that. It would really really really really bad mind design to allow your own consciously generate-able emotions to feed back into the reinforcement mechanism.

Start with one obvious point. I said the candy bar couldn’t be much of a reinforcer if you otherwise left it in the jar without eating it. The same seems broadly true of a victory gesture. I don’t feel the slightest urge to perform a victory gesture, and having tried it empirically I don’t feel the slightest urge to repeat it. This bodes poorly for its ability to be a strong reinforcer.

And over several billion years of evolution, the brain has every incentive to get rid of that behavior if indeed it was ever possible. Imagine a world in which our own thoughts and feelings can be strongly reinforcing. You’re a caveman, encountering a saber-toothed tiger. You have two choices. You can either feel fear, which is an unpleasant emotion. Or you can feel happiness, which is a pleasant emotion. First you try feeling fear, but that’s unpleasant! You don’t like fear! The feeling of fear is negatively reinforced and your brain learns to stop feeling it. Then you try happiness! You like happiness! The decision to feel happiness is positively reinforced. Yes, you decide, saber-toothed tigers are wonderful things and you are overjoyed there is one in front of you getting into a pouncing position and licking its lips and…well, this caveman isn’t going to live very long.

From the little I know about the reward system, it seems to operate on a basis of predicting pleasure level, then upregulating actions that result in world-states that seem more pleasurable than predicted and downregulating actions that result in world-states that seem less pleasurable than predicted. I don’t think you can prevent the “I’m going to do my victory gesture!” part of you and the “I’m going to predict my pleasure at time t+1” part of you from talking to each other, I don’t think internal pleasure is as reinforcing as external world-state results, and I don’t think the pleasure of making a victory gesture is strong enough to do much anyway.

…there were a lot of “I thinks” in that paragraph. Do we have any evidence here?

The literature on this is hiding under the obscure term “self-consequation”, and unfortunately it is all from Scientific Prehistory, ie the 1970s and 1980s before journal articles were uploaded to the Internet. I am able to find this full study, which does pretty much exactly the experiment listed at the beginning of this post – feed people candy in return for studying – and finds that it helps only if other people are there keeping them honest. But I am also able to find this abstract, which appears to be from a study showing the opposite – some kind of benefit – but is totally unavailable on the Internet. Both studies seem to refer to a long literature supporting their result and (sigh) neither seems aware of the other’s existence. However, I am more skeptical of the second, both because I can’t see it and because I worry that experimental protocols aren’t real self-reinforcement. That is, if an experimenter gives you their bag of candy and tells you to reinforce yourself by eating some when you do something good, that’s still different from using your own bag of candy and coming up with the idea on your own, even if the experimenter is out of the room when you’re working.

I will still try the technique, because it seems low cost and potentially high value. Really high value, actually. So high value that I would have expected the first person to get it right to take over the world. This is turning into another argument against it, isn’t it?

But yeah, as I was saying, I still intend to try the technique, even though it won’t be a very well-controlled experiment. And I’m glad I heard the idea for reminding me how little I know about behaviorism.