One of the easiest ways to understand probabilities is to think of them in terms of Venn Diagrams. You basically have a Universe with all the possible outcomes (of an experiment for instance), and you are interested in some subset of them, namely some event. Say we are studying cancer, so we observe people and see whether they have cancer or not. If we take as our Universe all people participating in our study, then there are two possible outcomes for any particular individual, either he has cancer or not. We can then split our universe in two events: the event “people with cancer” (designated as \(A\)), and “people with no cancer” (or \(

eg A\)). We could build a diagram like this:

So what is the probability that a randomly chosen person has cancer? It is just the number of elements in \(A\) divided by the number of elements of \(U\) (the Universe). We denote the number of elements of \(A\) as \(|A|\), and read it the cardinality of \(A\). And define the probability of \(A\), \(P(A)\), as

\[P(A) = \frac{|A|}{|U|}\]

Since \(A\) can have at most the same number of elements as \(U\), the probability \(P(A)\) can be at most one.

Good so far? Okay, let’s add another event. Let’s say there is a new screening test that is supposed to measure something. That test will be “positive” for some people, and “negative” for some other people. If we take the event B to mean “people for which the test is positive”. We can create another diagram:

So what is the probability that the test will be “positive” for a randomly selected person? It would be the number of elements of \(B\) (cardinality of \(B\), or \(|B|\)) divided by the number of elements of \(U\), we call this \(P(B)\), the probability of event \(B\) occurring.

\[P(B) = \frac{|B|}{|U|}\]

Note that so far, we have treated the two events in isolation. What happens if we put them together?

We can compute the probability of both events occurring (\(AB\) is a shorthand for \(A∩B\)) in the same way.

\[P(AB) = \frac{|AB|}{|U|}\]

But this is where it starts to get interesting. What can we read from the diagram above?

We are dealing with an entire Universe (all people), the event \(A\) (people with cancer), and the event \(B\) (people for whom the test is positive). There is also an overlap now, namely the event \(AB\) which we can read as “people with cancer and with a positive test result”. There is also the event \(B - AB\) or “people without cancer and with a positive test result”, and the event \(A - AB\) or “people with cancer and with a negative test result”.

Now, the question we’d like answered is “given that the test is positive for a randomly selected individual, what is the probability that said individual has cancer?”. In terms of our Venn diagram, that translates to “given that we are in region \(B\), what is the probability that we are in region \(AB\)?” or stated another way “if we make region \(B\) our new Universe, what is the probability of \(A\)?”. The notation for this is \(P(A|B)\) and it is read “the probability of A given B”.

So what is it? Well, it should be

\[P(A|B) = \frac{|AB|}{|B|}\]

And if we divide both the numerator and the denominator by \(|U|\)

\[P(A|B) = \frac{\frac{|AB|}{|U|}}{\frac{|B|}{|U|}}\]

we can rewrite it using the previously derived equations as

\[P(A|B) = \frac{P(AB)}{P(B)}\]

What we’ve effectively done is change the Universe from \(U\) (all people), to \(B\) (people for whom the test is positive), but we are still dealing with probabilities defined in \(U\).

Now let’s ask the converse question “given that a randomly selected individual has cancer (event \(A\)), what is the probability that the test is positive for that individual (event \(AB\))?”. It’s easy to see that it is

\[P(B|A) = \frac{P(AB)}{P(A)}\]

Now we have everything we need to derive Bayes' theorem, putting those two equations together we get

\[P(A|B)P(B) = P(B|A)P(A)\]

which is to say \(P(AB)\) is the same whether you’re looking at it from the point of view of \(A\) or \(B\), and finally

\[P(A|B) = \frac{P(B|A)P(A)}{P(B)}\]