I may be in one of those transit buses. Since I moved to New Jersey, I am going through this mess every day.

Well, you wanted to enjoy Manhattan skyline. It has a price tag.

D, glad you are here. It’s been a while. In our last meeting, we were discussing the concepts of variance operation and its properties. I continue to read your lessons every week. As I paused and reflected on all the lessons, I noticed that there is a systematic approach to them. You started with the basics of sets and probability, introduced lessons on visualizing, summarizing and comparing data using various statistics, then extended those ideas into random variables and probability distributions. The readership seems to have grown considerably, and people are tweeting about our classroom. Have you reached 25000 pageviews yet?

Great place to start learning #DataAnalysis is at https://t.co/OTGRiCsOTG by @realDevineni and thanks to @ThomasEWoods for mentioning it — Kevin Stokes (@kevinstokes58) September 14, 2017

We are at 24350 pageviews now. We will certainly hit the 25k mark today 😉 I am thankful to all the readers for their time. Special thanks to all those who are spreading the word. Our classroom is a great resource for anyone starting data analysis.

So, whats on the show today?

As you correctly pointed out, we are now slowly getting into various types of probability distributions. I mentioned in lesson 31 that we would learn several discrete probability distributions that are based on Bernoulli trials. We start this adventure with Binomial distribution.

Great. Let me refresh my memory of probability distributions before we get started. We discussed the basics of probability distribution in lesson 23. Let’s assume X is a random variable, and P(X = x) is the probability that this random variable takes any value x (i.e., an outcome). Then, the distribution of these probabilities on a number line, i.e., the probability graph is called the probability distribution function f(x) for a random variable. We are now looking at various mathematical forms for this f(x).

Fantastic. Now imagine you have a Bernoulli sequence of yes or no.

Sure. It is a sequence of 0s and 1s with a probability p; 0 if the trial yields a no (failure, or event not happening) and 1 if the trial yields a yes (success, or event happening). Something like this: 00101001101

From this sequence, if you are interested in the number of successes (1s) in n trials, this number follows a Binomial distribution. If you assume X is a random variable that represents the number of successes in a Bernoulli sequence of n trials, then this X should follow a binomial distribution. The probability that this random variable X takes any value k, i.e., the probability of exactly k successes in n trials is:

The expected value of this random variable, E[X] = np, and the variance V[X] = np(1-p).

😯 Wow, that’s a fastball. Can we parse through the lingo?

Oops… Okay, let us take the example of your daily commute. Imagine buses and cars pass through the tunnel each morning. Can you guesstimate the probability of buses?

Yeah, I usually see more buses than cars in the morning. Let’s say the likelihood of seeing a bus is p=0.7.

Now let us imagine that buses and cars come in a Bernoulli sequence. Assign a 1 if it is a bus, and 0 if it is a car.

That is reasonable. The vehicle passage is usually random. If we take that as a Bernoulli sequence, there will be some 1s and some 0s with a 0.7 probability of occurrence. In the long run, you will have 70% buses and 30% cars in any order.

Correct. Now think about this. In the next four vehicles that pass through the tunnel, how many of them will be buses?

Since there is randomness in the sequence, in the next four vehicles, I can say, all of them may be buses, or none of them will be buses or any number in between.

Exactly. The number of buses in a sequence of 4 vehicles can be 0, 1, 2, 3 or 4. These are the random variables represented by X. In other words, if X is the number of buses in 4 vehicles coming at random, then X can take 0, 1, 2, 3 or 4 as the outcomes. The probability distribution of X is binomial.

I understand how we came up with X. Why is the probability distribution of X called binomial?

It originates from the idea of the binomial coefficient that you may have learned in an elementary math/combinations class. Let us continue with our logical deduction to see how the probability is derived, and you will see why.

Sure. We have X as 0, 1, 2, 3 and 4. We should calculate the probability P(X = 0), P(X = 1), P(X = 2), P(X = 3) and P(X = 4). This will give us the distribution of the probabilities.

Take an example, say 2. Let us compute P(X = 2). The probability of seeing exactly two buses in 4 vehicles. The probability of exactly k successes in n trials. If the buses and cars come in a Bernoulli sequence (1 for bus and 0 for a car) with a probability p, in how many ways can you see two buses out of 4 vehicles?

Ah, I see where we are going with this. Let me list out the possibilities. Two buses in four vehicles can occur in six ways. 0011, 0101, 1100, 1010, 1001, 0110. In each of these six possible sequences, there will be exactly two buses among four vehicles. I remember from my combinations class that this is four choose two. Four factorial divided by the product of two factorial and (four minus two) factorial. 4C2 = 4!/4!(4-2)!

For each possibility, the probability of that sequence can also be written down. Let me make a table like this:

You can see from the table that there are six possibilities. Any of the possibilities, 1 or 2 or 3 or 4 or 5 or 6 can occur. Hence, the probability of seeing two in four is the sum of these probabilities. Remember P(A or B) = P(A) + P(B). If you follow through this, you will get, 6*p*p*(1-p)*(1-p). = 6*p^2*(1-p)^(4-2). Can you see where the formula for binomial distribution comes from?

Absolutely. For each outcome of X, i.e., 0, 1, 2, 3 and 4, we should apply this logic/formula and compute the probability of the outcome. Let me finish it and make a plot.

Very nicely done. Let me jump in here and show you another plot with a different n and p. If p = 0.5 (equal probability) and n = 100; this is how the binomial distribution looks like.

Nice. It looks like an inverted bell centered around 50.

Yeah. You noticed that the distribution is centered around 50. It is the expected value of the distribution. Remember E[X] is the central tendency of the distribution. For binomial, you can derive it as np = 100 (0.5) = 50. In the same way, the variance, i.e. spread of the function around this center is np(1-p) = 100(0.5)(0.5) = 25. Or standard deviation is 5. You can see that the distribution is spread out within three standard deviations from the center. Can you now imagine how the distribution will look like for p = 0.3 or p = 0.7?

Following the same logic, those distributions will be centered on 100*0.3 = 30 and 100*0.7 = 70 with their variance. Now it all makes sense.

You see how easy it is when you go through the logic. We started with Bernoulli sequence. When we are interested in the random variable that is the number of successes in so many trials, it follows a binomial distribution. “Exactly k successes” is the language of Binomial distribution. Can you think of any other examples that can be modeled as a binomial distribution?

Probability that Derek Jeter, with a batting average of 0.3, gets three hits out of the three times he comes to bat 😆 This is fun. I am glad I learned some useful concepts out of the messy commute experience. By the way, Exactly one landfall in the next four hurricanes is also binomial. With Jose coming up, I wonder if we can compute the probability of damage for New York City based on the probability of landfall.

Don’t worry Joe. Our Mayor is graciously implementing his comprehensive $20 billion resiliency plan. NYC is safe now. Forget probability of damage. You need to worry about the probability of bankruptcy.

If you find this useful, please like, share and subscribe.

You can also follow me on Twitter @realDevineni for updates on new lessons.