Cross-posted from my blog.

Epistemic status: Probably discussed to death in multiple places, but people still make this mistake all the time. I am not well versed in UDT, but it seems along the same lines. Or maybe I am reinventing some aspects of Game Theory.

We know that physics does not support the idea of metaphysical free will. By metaphysical free will I mean the magical ability of agents to change the world by just making a decision to do so. To the best of our knowledge, we are all (probabilistic) automatons who think themselves as agents with free choices. A model compatible with the known laws of physics is that what we think of as modeling, predicting and making choices is actually learning which one of the possible worlds we live in. Think of it as being a passenger in a car and seeing new landscapes all the time. The main difference is that the car is invisible to us and we constantly update the map of the expected landscape based on what we see. We have a sophisticated updating and predicting algorithm inside, and it often produces accurate guesses. We experience those as choices made. As if we were the ones in the driver's seat, not just the passengers.

Realizing that decisions are nothing but updates, that making a decision is a subjective experience of discovering which of the possible worlds is the actual one immediately adds clarity to a number of decision theory problems. For example, if you accept that you have no way to change the world, only to learn which of the possible worlds you live in, then the Newcomb's problem with a perfect predictor becomes trivial: there is no possible world where a two-boxer wins. There are only two possible worlds, one where you are a one-boxer who wins, and one where you are a two-boxer who loses. Making a decision to either one-box or two-box is a subjective experience of learning what kind of a person are you, i.e. what world you live in.

This description, while fitting the observations perfectly, is extremely uncomfortable emotionally. After all, what's the point of making decisions if you are just a passenger spinning a fake steering wheel not attached to any actual wheels? The answer is the usual compatibilism one: we are compelled to behave as if we were making decisions by our built-in algorithm. The classic quote from Ambrose Bierce applies:

"There's no free will," says the philosopher; "To hang is most unjust."

"There is no free will," assents the officer; "We hang because we must."

So, while uncomfortable emotionally, this model lets us make better decisions (the irony is not lost on me, but since "making a decision" is nothing but an emotionally comfortable version of "learning what possible world is actual", there is no contradiction).

An aside on quantum mechanics. It follows from the unitary evolution of the quantum state, coupled with the Born rule for observation, that the world is only predictable probabilistically at the quantum level, which, in our model of learning about the world we live in, puts limits on how accurate the world model can be. Otherwise the quantum nature of the universe (or multiverse) has no bearing on the perception of free will.

Let's go through the examples some of which are listed as the numbered dilemmas in a recent paper by Eliezer Yudkowsky and Nate Soares, Functional decision theory: A new theory of instrumental rationality. From here on out we will refer to this paper as EYNS.

Psychological Twin Prisoner’s Dilemma

An agent and her twin must both choose to either “cooperate” or “defect.” If both cooperate, they each receive $1,000,000. If both defect, they each receive $1,000. If one cooperates and the other defects, the defector gets $1,001,000 and the cooperator gets nothing. The agent and the twin know that they reason the same way, using the same considerations to come to their conclusions. However, their decisions are causally independent, made in separate rooms without communication. Should the agent cooperate with her twin?

First we enumerate all the possible worlds, which in this case are just two, once we ignore the meaningless verbal fluff like "their decisions are causally independent, made in separate rooms without communication." This sentence adds zero information, because the "agent and the twin know that they reason the same way", so there is no way for them to make different decisions. These worlds are

Cooperate world: $1,000,000 Defect world: $1,000

There is no possible world, factually or counterfactually, where one twin cooperates and the other defects, no more than there are possible worlds where 1 = 2. Well, we can imagine worlds where math is broken, but they do not usefully map onto observations. The twins would probably be smart enough to cooperate, at least after reading this post. Or maybe they are not smart enough and will defect. Or maybe they hate each other and would rather defect than cooperate, because it gives them more utility than money. If this was a real situation we would wait and see which possible world they live in, the one where they cooperate, or the one where they defect. At the same time, subjectively to the twins in the setup it would feel like they are making decisions and changing their future.

The absent-minded Driver problem:

An absent-minded driver starts driving at START in Figure 1. At X he can either EXIT and get to A (for a payoff of 0) or CONTINUE to Y. At Y he can either EXIT and get to B (payoff 4), or CONTINUE to C (payoff 1). The essential assumption is that he cannot distinguish between intersections X and Y, and cannot remember whether he has already gone through one of them.

There are three possible worlds here, A, B and C, with utilities 0, 4 and 1 correspondingly, and by observing the driver "making a decision" we learn which world they live in. If the driver is a classic CDT agent, they would turn and end up at A, despite it being the lowest-utility action. Sucks to be them, but that's their world.

The Smoking Lesion Problem

An agent is debating whether or not to smoke. She knows that smoking is correlated with an invariably fatal variety of lung cancer, but the correlation is (in this imaginary world) entirely due to a common cause: an arterial lesion that causes those afflicted with it to love smoking and also (99% of the time) causes them to develop lung cancer. There is no direct causal link between smoking and lung cancer. Agents without this lesion contract lung cancer only 1% of the time, and an agent can neither directly observe, nor control whether she suffers from the lesion. The agent gains utility equivalent to $1,000 by smoking (regardless of whether she dies soon), and gains utility equivalent to $1,000,000 if she doesn’t die of cancer. Should she smoke, or refrain?

The problem does not specify this explicitly, but it seems reasonable to assume that the agents without the lesion do not enjoy smoking and get 0 utility from it.

There are 8 possible worlds here, with different utilities and probabilities:

An agent who "decides" to smoke has higher expected utility than the one who decides not to, and this "decision" lets us learn which of the 4 possible worlds could be actual, and eventually when she gets the test results we learn which one is the actual world.

Note that the analysis would be exactly the same if there was a “direct causal link between desire for smoking and lung cancer”, without any “arterial lesion”. In the problem as stated there is no way to distinguish between the two, since there are no other observable consequences of the lesion. There is 99% correlation between the desire to smoke and and cancer, and that’s the only thing that matters. Whether there is a “common cause” or cancer causes the desire to smoke, or desire to smoke causes cancer is irrelevant in this setup. It may become relevant if there were a way to affect this correlation, say, by curing the lesion, but it is not in the problem as stated. Some decision theorists tend to get confused over this because they think of this magical thing they call "causality," the qualia of your decisions being yours and free, causing the world to change upon your metaphysical command. They draw fancy causal graphs like this one:

instead of listing and evaluating possible worlds.

Parfit’s Hitchhiker Problem

An agent is dying in the desert. A driver comes along who offers to give the agent a ride into the city, but only if the agent will agree to visit an ATM once they arrive and give the driver $1,000.

The driver will have no way to enforce this after they arrive, but she does have an extraordinary ability to detect lies with 99% accuracy. Being left to die causes the agent to lose the equivalent of $1,000,000. In the case where the agent gets to the city, should she proceed to visit the ATM and pay the driver?

We note a missing piece in the problem statement: what are the odds of the agent lying about not paying and the driver detecting the lie and giving a ride, anyway? It can be, for example, 0% (the driver does not bother to use her lie detector in this case) or the same 99% accuracy as in the case where the agent lies about paying. We assume the first case for this problem, as this is what makes more sense intuitively.

As usual, we draw possible worlds, partitioned by the "decision" made by the hitchhiker and note the utility of each possible world. We do not know which world would be the actual one for the hitchhiker until we observe it ("we" in this case might denote the agent themselves, even though they feel like they are making a decision).

So, while the highest utility world is where the agent does not pay and the driver believes they would, the odds of this possible world being actual are very low, and the agent who will end up paying after the trip has higher expected utility before the trip. This is pretty confusing, because the intuitive CDT approach would be to promise to pay, yet refuse after. This is effectively thwarted by the driver's lie detector. Note that if the lie detector was perfect, then there would be just two possible worlds:

pay and survive, do not pay and die.

Once the possible worlds are written down, it becomes clear that the problem is essentially isomorphic to Newcomb's.

Another problem that is isomorphic to it is

The Transparent Newcomb Problem

Events transpire as they do in Newcomb’s problem, except that this time both boxes are transparent — so the agent can see exactly what decision the predictor made before making her own decision. The predictor placed $1,000,000 in box B iff she predicted that the agent would leave behind box A (which contains $1,000) upon seeing that both boxes are full. In the case where the agent faces two full boxes, should she leave the $1,000 behind?

Once you are used to enumerating possible worlds, whether the boxes are transparent or not, does not matter. The decision whether to take one box or two already made before the boxes are presented, transparent or not. The analysis of the conceivable worlds is identical to the original Newcomb’s problem. To clarify, if you are in the world where you see two full boxes, wouldn’t it make sense to two-box? Well, yes, it would, but if this is what you "decide" to do (and all decisions are made in advance, as far as the predictor is concerned, even if the agent is not aware of this), you will never (or very rarely, if the predictor is almost, but not fully infallible) find yourself in this world. Conversely, if you one-box even if you see two full boxes, that situation is always, or almost always happens.

If you think you pre-committed to one-boxing but then are capable of two boxing, congratulations! You are in the rare world where you have successfully fooled the predictor!

From this analysis it becomes clear that the word “transparent” is yet another superfluous stipulation, as it contains no new information. Two-boxers will two-box, one-boxers will one-box, transparency or not.

At this point it is worth pointing out the difference between world counting and EDT, CDT and FDT. The latter three tend to get mired in reasoning about their own reasoning, instead of reasoning about the problem they are trying to decide. In contrast, we mindlessly evaluate probability-weighted utilities, unconcerned with the pitfalls of causality, retro-causality, counterfactuals, counter-possibilities, subjunctive dependence and other hypothetical epicycles. There are only recursion-free possible worlds of different probabilities and utilities, and a single actual world observed after everything is said and done. While reasoning about reasoning is clearly extremely important in the field of AI research, the dilemmas presented in EYNS do not require anything as involved. Simple counting does the trick better.

The next problem is rather confusing in its original presentation.

The Cosmic Ray Problem

An agent must choose whether to take $1 or $100. With vanishingly small probability, a cosmic ray will cause her to do the opposite of what she would have done otherwise. If she learns that she has been affected by a cosmic ray in this way, she will need to go to the hospital and pay $1,000 for a check-up. Should she take the $1, or the $100?

A bit of clarification is in order before we proceed. What does “do the opposite of what she would have done otherwise” mean, operationally?. Here let us interpret it in the following way:

Deciding and attempting to do X, but ending up doing the opposite of X and realizing it after the fact.

Something like “OK, let me take $100… Oops, how come I took $1 instead? I must have been struck by a cosmic ray, gotta do the $1000 check-up!”

Another point is that here again there are two probabilities in play, the odds of taking $1 while intending to take $100 and the odds of taking $100 while intending to take $1. We assume these are the same, and denote the (small) probability of a cosmic ray strike as p.

The analysis of the dilemma is boringly similar to the previous ones:

Thus attempting to take $100 has a higher payoff as long as the “vanishingly small” probability of the cosmic ray strike is under 50%. Again, this is just a calculation of expected utilities, though an agent believing in metaphysical free will may take it as a recommendation to act a certain way.

The following setup and analysis is slightly more tricky, but not by much.

The XOR Blackmail

An agent has been alerted to a rumor that her house has a terrible termite infestation that would cost her $1,000,000 in damages. She doesn’t know whether this rumor is true. A greedy predictor with a strong reputation for honesty learns whether or not it’s true, and drafts a letter:

I know whether or not you have termites, and I have sent you this letter iff exactly one of the following is true: (i) the rumor is false, and you are going to pay me $1,000 upon receiving this letter; or (ii) the rumor is true, and you will not pay me upon receiving this letter.

The predictor then predicts what the agent would do upon receiving the letter, and sends the agent the letter iff exactly one of (i) or (ii) is true. 13 Thus, the claim made by the letter is true. Assume the agent receives the letter. Should she pay up?

The problem is called “blackmail” because those susceptible to paying the ransom receive the letter when their house doesn’t have termites, while those who are not susceptible do not. The predictor has no influence on the infestation, only on who receives the letter. So, by pre-committing to not paying, one avoids the blackmail and if they receive the letter, it is basically an advanced notification of the infestation, nothing more. EYNS states “the rational move is to refuse to pay” assuming the agent receives the letter. This tentatively assumes that the agent has a choice in the matter once the letter is received. This turns the problem on its head and gives the agent a counterintuitive option of having to decide whether to pay after the letter has been received, as opposed to analyzing the problem in advance (and precommitting to not paying, thus preventing the letter from being sent, if you are the sort of person who believes in choice).

The possible worlds analysis of the problem is as follows. Let’s assume that the probability of having termites is p, the greedy predictor is perfect, and the letter is sent to everyone “eligible”, i.e. to everyone with an infestation who would not pay, and to everyone without the infestation who would pay upon receiving the letter. We further assume that there are no paranoid agents, those who would pay “just in case” even when not receiving the letter. In general, this case would have to be considered as a separate world.

Now the analysis is quite routine:

Thus not paying is, not surprisingly, always better than paying, by the “blackmail amount” 1,000(1-p).

One thing to note is that the case of where the would-pay agent has termites, but does not receive a letter is easy to overlook, since it does not include receiving a letter from the predictor. However, this is a possible world contributing to the overall utility, if it is not explicitly stated in the problem.

Other dilemmas that yield to a straightforward analysis by world enumeration are Death in Damascus, regular and with a random coin, the Mechanical Blackmail and the Psychopath Button.

One final point that I would like to address is that treating the apparent decision making as a self- and world-discovery process, not as an attempt to change the world, helps one analyze adversarial setups that stump the decision theories that assume free will.

Immunity from Adversarial Predictors

EYNS states in Section 9:

“There is no perfect decision theory for all possible scenarios, but there may be a general-purpose decision theory that matches or outperforms all rivals in fair dilemmas, if a satisfactory notion of “fairness” can be formalized.” and later “There are some immediate technical obstacles to precisely articulating this notion of fairness. Imagine I have a copy of Fiona, and I punish anyone who takes the same action as the copy. Fiona will always lose at this game, whereas Carl and Eve might win. Intuitively, this problem is unfair to Fiona, and we should compare her performance to Carl’s not on the “act differently from Fiona” game, but on the analogous “act differently from Carl” game. It remains unclear how to transform a problem that’s unfair to one decision theory into an analogous one that is unfair to a different one (if an analog exists) in a reasonably principled and general way.”

I note here that simply enumerating possible worlds evades this problem as far as I can tell.

Let’s consider a simple “unfair” problem: If the agent is predicted to use a certain decision theory DT1, she gets nothing, and if she is predicted to use some other approach (DT2), she gets $100. There are two possible worlds here, one where the agent uses DT1, and the other where she uses DT2:

So a principled agent who always uses DT1 is penalized. Suppose another time the agent might face the opposite situation, where she is punished for following DT2 instead of DT1. What is the poor agent to do, being stuck between Scylla and Charybdis? There are 4 possible worlds in this case:

Agent uses DT1 always Agent uses DT2 always Agent uses DT1 when rewarded for using DT1 and DT2 when rewarded for using DT2 Agent uses DT1 when punished for using DT1 and DT2 when punished for using DT2

The world number 3 is where a the agent wins, regardless of how adversarial or "unfair" the predictor is trying to be to her. Enumerating possible worlds lets us crystallize the type of an agent that would always get maximum possible payoff, no matter what. Such an agent would subjectively feel that they are excellent at making decisions, whereas they simply live in the world where they happen to win.