Well if you guessed it to be Alice you are correct. Perhaps your reasoning would be the content has words love, great and wonderful that are used by Alice.

Now let’s add a combination and probability in the data we have.Suppose Alice and Bob uses following words with probabilities as show below. Now, can you guess who is the sender for the content : “Wonderful Love.”

Probability of word usage of Alice and Bob

Now what do you think?

If you guessed it to be Bob, you are correct. If you know mathematics behind it, good for you. If not, don’t worry we shall do it in next section. This is where we apply Bayes Theorem.

Bayes Theorem

It tells us how often A happens given that B happens, written P(A|B), when we know how often B happens given that A happens, written P(B|A) , and how likely A and B are on their own.

P(A|B) is “Probability of A given B”, the probability of A given that B happens

P(A) is Probability of A

P(B|A) is “Probability of B given A”, the probability of B given that A happens

P(B) is Probability of B

When P(Fire) means how often there is fire, and P(Smoke) means how often we see smoke, then:

P(Fire|Smoke) means how often there is fire when we see smoke.

P(Smoke|Fire) means how often we see smoke when there is fire.

So the formula kind of tells us “forwards” when we know “backwards” (or vice versa)

Example: If dangerous fires are rare (1%) but smoke is fairly common (10%) due to factories, and 90% of dangerous fires make smoke then:

P(Fire|Smoke) =P(Fire) P(Smoke|Fire) =1% x 90% = 9%P(Smoke)10%

In this case 9% of the time expect smoke to mean a dangerous fire.

Now can you apply this to out Alice and Bob example?

Naive Bayes Classifier

Naive Bayes classifier calculates the probabilities for every factor ( here in case of email example would be Alice and Bob for given input feature). Then it selects the outcome with highest probability.

This classifier assumes the features (in this case we had words as input) are independent. Hence the word naive. Even with this it is powerful algorithm used for

Real time Prediction

Text classification/ Spam Filtering

Recommendation System

So mathematically we can write as,

If we have a certain event E and test actors x1,x2,x3, etc.

We first calculate P(x1| E) , P(x2 | E) … [read as probability of x1 given event E happened] and then select the test actor x with maximum probability value.

I hope this explains well what Naive Bayes classifier is. In next part we shall use sklearn in Python and implement Naive Bayes classifier for labelling email to either as Spam or Ham. Comment in section below if you need any help or have any suggestions.

Code and implement the email classification into spam and non spam here( Part 2 of chapter 1).

Read about Support Vector Machine in chapter 2 here.