Bayes' Theorem Derivation

There is a bag with four wooden balls in it. To 'sample' from the bag we jumble up the contents, reach in, and take out one of the balls. After observing the ball we put it back into the bag (sampling with replacement) so that we can continue taking samples ad infinitum, with the probabilities remaining unchanged each time.

The balls have two distinguishing attributes, their colour and the presence or absence of a spot; the attributes are arranged like so:

Colour Has Spot Ball 1 Red N Ball 2 Red N Ball 3 Red Y Ball 4 Blue Y

Three balls are red, one is blue.

Two balls have a spot, two don't.

At this point we can calculate some simple probabilities. There are four balls and no way to tell them apart or cheat, so the probability of drawing any one ball is the same as for any other ball, and is always 1/4 = 25% = 0.25. We can write this as:

$${P}(ball1) = {P}(ball2) = {P}(ball3) = {P}(ball4) = \frac{1}{N} = \frac{1}{4} = 0.25 $$

Where N is the number of balls.

Half of the balls have a spot, so the probability of drawing a ball with a spot is 50%; we write this as:

$${P}(spot) = \frac{N_{spot}}{N} = \frac{2}{4} = 0.5$$

Where `N_{spot}` is the number of balls with a spot.

We do the same again with ball colour:

$${P}(red) = \frac{N_{red}}{N} = \frac{3}{4} = 0.75 \tag{1}$$

$${P}(blue) = \frac{N_{blue}}{N} = \frac{1}{4} = 0.25 $$

Since a ball must be red or blue, and cannot be both red and blue, we can also state:

$$ {P}(red) + {P}(blue) = 1 $$

Or equivalently:

$$ {P}(red) = 1 - {P}(blue)$$

$$ {P}(blue) = 1 - {P}(red)$$

We can also ask - what is the probability of drawing a red ball with a spot? Again, we just count the number of balls that meet our test condition and divide by the total number of balls:

$$ {P}(red, spot) = \frac{N_{red \ and\ spot}}{N} = \frac{1}{4} = 0.25 \tag{2}$$

And so on.

A conditional probability is just the probability of some thing given that we already know something else. E.g. A friend draws a ball without showing it to us and tells us it is red; what is the probability of it having a spot?

We know there are three red balls, one of which has a spot, so:

$$ P(spot | red) = \frac{N_{red \ and\ spot}}{N_{red}} = \frac{1}{3} = 0.3\overline{3} \tag{3} $$

Note that the denominator is now three, not four as before; we know that a red ball was drawn, and there are only three of them, i.e. we eliminated the blue ball from our set of possibilities.

Equation 3 has `N_{red and spot}` and `N_{red}` on the right hand side, and these two variables are also present in equations 1 and 2 respectively. In fact if we divide equation 2 by equation 1 we get:

$$ \frac{P(red, spot)}{P(red)} = \frac{N_{red \ and \ spot}}{N_{red}} \tag{4}$$

I.e. we have two conditions and we count how many balls meet each condition:

The number of balls that are red and have a spot => `N_{red and spot}` The number of balls that are red => `N_{red}`

Equation 4 is just saying that the ratio of the two probabilities is equal to the ratio of those two counts.

Recalling that `\frac{N_{red and spot}}{N_{red}}` occurred in equation 3; we can now write:

$$ \frac{P(red, spot)}{P(red)} = P(spot | red) \tag{5}$$

This pretty much captures Bayes Theorem but it's not quite the full Bayes rule in its commonly stated form; we need to apply one further rule:

$$ P(red, spot) = P(spot, red) \tag{6}$$

Here we're just saying that the probability of drawing a ball that is both red and has a spot, is the same as drawing one that has a spot and is red; it's so obvious you might think why even mention it? Because, if we go through all of the above derivation with the 'red' and 'spot' terms swapped then we end up with something that looks like equation 5, but with the two terms having swapped places:

$$ \frac{P(spot, red)}{P(spot)} = P(red | spot) \tag{7}$$

Combining equations 5, 6 and 7, we can rearrange to get:

$$ P(spot | red) P(red) = P(red | spot) P(spot) \tag{Bayes' Theorem} $$

That is the easier to remember 'balanced' form. To get the commonly stated form just divide both sides by `P(red)`:

$$ P(spot | red) = \frac{P(red | spot)P(spot)}{P(red)} \tag{Bayes' Theorem}$$

Colin,

August 17, 2016