The Truth of the Conjunction Fallacy: Why Linda Can be More Probable to be Both a Bank Teller and a Feminist (Than Just Being a Bank Teller) Majid Hasan Follow Apr 15, 2017 · 12 min read

A bstract

The law of conjunction tells us that a conjunction of propositions can be no more probable to be true than each one of the individual propositions. The law is, however, frequently violated by intuitive judgements where a conjunction of propositions is deemed more probable than individual propositions. In this essay, we argue that the mismatch between our intuition and the laws of probability is not merely a logical error resulting from careless thinking, but arises due to our restrictive conception of truth and probability based on the notion of bivalent truths, which restrict claims to be true only to degree zero (wholly false) and one (wholly true) and disallow intermediate values of truth, and in fact, our seemingly illogical intuitions may be more in line with a less restrictive conception of truth and probability, which accommodates intermediate degrees of truth.

Fuzzy Truths

The first principle of logic tells us that a claim can either be true or false. Truth is binary. A claim cannot be more or less true. The earth either is flat or it is not. So the claim that earth is flat is either true or false; It cannot be both true and false, say half true and half false.

Except that it can be, as long as the curvature of the earth can be quantified. In this case, the claim that the earth is flat can be true to various degrees, as measured by the extent to which the curvature of the earth deviates from that of a flat surface. The claim can be more or less true, depending on the actual curvature of the earth. As the curvature of the earth deviates away from that of a flat surface, the claim would be true to a lower degree and false to a higher degree. This is what is referred to as ‘fuzzy logic’ and ‘fuzzy truths’.

Fuzzy truths do not restrict truth to a binary conception of truth and false, and, in general, allow claims to be true to some degree and false to some degree. The advantage of doing so is obvious. In a binary conception of truth, the claim that earth is flat is false, and so is the claim that the earth is spherical — because the earth is more elliptical, and it is not even perfectly elliptical, because it has bumps and mountains. And in a bivalent conception of truth, both claims would be equally false, because this conception does not accommodate any degree of falsehood. Multivalent logic, in contrast, allows us to quantify the degree of truth, and rank claims by the extent to which they are true and false. Thus, fuzzy logic allows us to incorporate more information about a claim.

The Linda Problem

We already believe and practice the logic of fuzzy truths, but not consistently. This inconsistency often leads to conflicts and paradoxes. Take for example the Linda problem, documented by Tversky and Kahneman (1983), which has now become a well-known example of conjunction fallacy. The problem is as follows: Participants are provided the following information

“Linda is 31 years old, single, outspoken and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations,” and then asked to judge whether Linda is more probable to be (A) a bank teller, or (B) a bank teller and a feminist, and the majority of the participants rate proposition B to be more probable, violating the conjunction principle, because the proposition B is a conjunction of proposition A (Linda is a bank teller) and another proposition (Linda is a feminist), and cannot be more probable than proposition A, according to the conjunction principle.

Now consider the following situation: we have 10 women that satisfy the description provided for Linda, 5 of them are bank tellers, and 5 of them are not, and the 3 of those 5 who are bank tellers are also feminists, and 2 of those who are not bank tellers are feminists. Hence, the frequentist probability of Linda being a bank teller is 50%, and of Linda being a bank teller and a feminist is 30%. So the proposition that Linda is a bank teller is more probable.

Now, ask yourself, is the proposition that Linda is a bank teller and a feminist true (false) to the same degree as the proposition that Linda is a bank teller, each time we observe a woman who is both (neither) a bank teller and (nor) a feminist? Is it not the case that the proposition A can be said to match (fail to match) two observations — an observation that Linda is a feminist and another observation that Linda is a bank teller — each time we observe a woman who is both (neither) a bank teller and (nor) a feminist; while the proposition that Linda is a bank teller only matches (fails to match) one observation. In this sense, the proposition that Linda is a bank teller and a feminist can be both true and false to a higher degree, simply because it predicts more attributes of each observation. In this case, one cannot simultaneously subscribe to a frequentist probability approach and reject that both propositions are true or false to the same degree, whenever they are true or false!

Using each attribute of Linda as a distinct observation, we can define the degree of truth of each proposition as the frequentist probability that the proposition assigns to each observation (this is merely to invoke the persuasive power of irony, and not to argue that this is the best way to define the degree of truth). To put it more formally, let us assume that proposition A assigns 99.9% probability to Linda being a bank teller, and 0.1% probability to Linda not being a bank teller; and proposition B assigns a 99.9% probability to Linda being a bank teller and a feminist, and 0.1% probability to Linda not being a bank teller and a feminist. Table 1 summarises the probabilities that each proposition would assign to different observations. It is clear that for two out of four possible types of observation, the proposition A assigns a probability of about 50%, and a probability of about 0% for the other two types of observations. While the proposition B assigns much higher probability (roughly 100%) to one type of observation, and assigns much smaller probabilities (roughly 0%) to the rest of the observations.

In the context of bivalent truth, the probability of truth is simply the average degree of truth, as the degree of truth can only be zero and one, and, hence, the expected degree of a proposition’s truth is equal to its probability of truth, as can be seen in the last row of Table 1, where the average degree of bivalent truth for the two propositions turns out to be 0.5 and 0.3, equal to the frequentist probabilities of these propositions. So let’s now compute the average degree of truth for the two propositions for the case of 10 women mentioned above, where 5 women are bank tellers 3 of which are feminists, and other 5 are not bank tellers, and 2 of which are feminists. For this example, the average degree of truth of proposition B would actually be higher than the average degree of truth of proposition A, implying that Linda is more likely to be a bank teller and a feminist than just being a bank teller!

So what is the matter? Why is the average degree of truth committing the conjunction fallacy? And why does the average degree of truth not yield the same answer as the frequentist probability? What is happening is that the average degree of truth incorporates both the degree of truth (by considering each attribute a distinct observations) and the frequency of truth (in the form of number of observed women), and it is thus a measure of probability of truth conditional on the degree of truth; while the conventional frequentist probability only considers the frequency of truth, but not its degree, and can be interpreted as the probability of truth unconditional on the degree of truth.

It is only with bivalent truths that the average degree of truth coincides with the frequentist probability, because a bivalent notion of truth does not distinguish between various degrees of truth, and no information is lost by not conditioning on the degree of truth. In general, however, the average degree of truth and the frequentist probability of truth do not necessarily measure the same thing, and may refer to two distinct dimensions of truth. Thus, with more general fuzzy truths, whose degree of truth does not have to be 0 or 1, and the average degree of truth can deviate from the frequentist probability of truth, leading to possible confusion if we are not careful about which notion of probability is being used.

How True Does It Have to be to be True?

The effect of the degree of truth on our judgement can be even more amplified, if we consider propositions to be true only if the degree of truth exceeds a certain threshold. That is, a proposition whose prediction is off by more than 30%, or that only explains less than 70% attributes of an observation, may be deemed untrue. In this case, propositions that may be always true (in the frequentist sense), but true only to a mild degree, would be rejected in the favour of even very infrequently true propositions that are true to a sufficiently high degree.

This is perhaps also why narratives or stories appear more true than mere statistical facts. Narratives postulate truths that there are more likely to be true to a higher degree, even if less frequently true, as they provide a more ‘comprehensive picture’ of an observation (say an individual), compared to statistical facts, which almost by construction are more frequently true, but often true to a lower degree, as they only correspond to one or two attributes of the individual. For example, a narrative about how those who get higher pleasure from their work feel more committed to it, and are able to work harder longer hours, and consequently more likely to be succeed, may seem more true than a positive statistical relation between hours worked and the level of success. This is because, while a positive relation between hours worked and degree of success may be valid for a larger number of individuals, such a relation does not say (predict) anything else about the behaviour of these hard working and successful individuals. In contrast, a narrative relates different aspects of an individual’s behaviour — the joy derived from work, the degree of commitment to the work, and the hours spent working — making it possible for the narrative to be true to a higher degree. And as long as there are a few individuals who personify the entire narrative, the narrative may prove to be more true than the statistical fact.

For example, consider the Linda example, and assume that we set our threshold for the degree of truth at 0.7 to declare a statement true in bivalent notion of truth. Now if we start with the fuzzy notion of truth, and compute the degree of truth of both propositions using the fuzzy truth notion, and then translate them to bivalent scale by comparing the degree of truth to our chosen threshold of 0.7. Table 2 shows the results: only proposition B manages to cross the minimum threshold; the proposition A is now never deemed true, and its frequentist probability of truth is zero, as it fails to cross the minimum threshold for the degree of truth. In other words, when we raise the bar for what counts as true, even when the proposition is true, it is still false! Thus, statistical regularities or partial truths, that are true only to a mild degree, can get completely discarded, even if they are frequently true, while narratives or complete truths, can be deemed more true and true more often, even if they are true much less frequently. (This is merely to argue why this can happen, and not to argue that such a treatment of truth is optimal; in fact this treatment conflates the degree of truth with the frequency of truth, and commits the same fallacy that the essay is arguing against.)

Framing Effects

Another example of the confusion arising from an informal treatment of the degree of truth can be seen in some of the ‘framing effects’. Our preferences are known to be sensitive to framing meaning that logically identical choices elicit different responses when framed differently. For example, Kahneman and Tversky (1981) report the results of an experiment in which participants are asked to choose a program to combat a disease that is expected to kill 600 people. First the problem is framed in terms of ‘lives saved’:

If Program A is adopted, 200 people will be saved; [72 percent]

If Program B is adopted, there is 1/3 probability that 600 people will be saved, and 2/3 probability that no people will be saved; [28 percent]

and a majority (72%) prefer the safer option. Next the problem is framed in terms of ‘lives lost’:

If Program C is adopted 400 people will die; [22 percent]

If Program D is adopted there is 1/3 probability that nobody will die, and 2/3 probability that 600 people will die; [78 percent]

and a majority prefers the riskier option. They conclude that the responses are inconsistent because the two framings are logically identical: saving 200 people means that 400 will die; saving 600 people means no one will die; and failing to save anyone means that everyone will die. Thus, the choices of the majority are sensitive to the framing of the same underlying problem.

However, the two framings are not necessarily equivalent — not if one does not subscribe to a bivalent conception of truth. If one does not presuppose a bivalent notion of life, where one can be either saved or dead, and consider the degree of life to be a continuous variable that can range between zero and one, so that being dead may correspond to having a degree of life below a lower threshold, and being saved may correspond to having a degree of life that exceeds an upper threshold. This view of life can also incorporate the more restrictive bivalent view of life by setting the thresholds for being dead and saved to be the same, say at 0.5. Those with a degree of life below 0.5 are considered dead, and those above 0.5 are considered saved. But why should the thresholds for being saved and dead coincide? Are all conditions of life equal? Is conscious life the same as unconscious life? Is there no difference between being paralysed and not being paralysed? Did Jesus not come to us to crystallise the distinction being merely undead and being saved? In general, therefore, the thresholds for being dead and saved do not have to coincide. All who are not dead are not saved; all who are not saved are not dead; all who are neither dead nor saved may still exist; and some are saved even after they are dead!

As long as the two thresholds (of being dead and alive) do not coincide, there will be a state where one can neither declared to be dead nor alive (in a bivalent notion of life), then the two framings would not be equivalent. To say that 200 will live would imply that we know with full confidence that these 200 will have a degree of life that exceeds the upper threshold of being alive, but the remaining 400 may not all be below the lower threshold of ‘death’, they may just not be in a good enough condition to be called alive. But to say that 400 will die means that these 400 are below the lower threshold of death, and the remaining 200, while above the lower threshold of death, may still fail to cross the upper threshold of life. They may be stuck between a state of a warm life and a cold death — undead but not yet saved.

Kahneman writes in his book, Thinking Fast and Slow (2011), that “Respondents confronted with their conflicting answers are typically puzzled. Even after rereading the problems, they still wish to be risk averse in the “lives saved” version; they wish to be risk seeking in the “lives lost” version; and they also wish to obey invariance and give consistent answers in the two versions”. Don’t you worry, O mankind! Your wishes have been heard; Your choices have been shown to be consistent!

Conclusion

Existing formal treatments of probability and truth do not fully incorporate the degree of truth. For example, traditional conjunction (disjunction) principles provide frequentist probabilities of a conjunction (disjunction) or propositions being true, unconditional on the degree of truth. With bivalent truths, this frequentist probability is exactly equal to the average degree of a proposition’s truth. However, with more general fuzzy truths, it is not the case. As a result, our common as well as formal usage of probability often mixes and confuses the two notions of probability — the frequentist probability and the average degree of truth. In another essay, we will argue in more detail why the frequentist probability and the degree of truth should be treated as two independent dimensions of the truth, and develop a framework to illustrate how not to merge these two dimensions of truth, and how this more flexible treatment of truth is much more aligned with our intuitions.