Hey Joe, what are you up to these days?

Apart from visiting DC recently, life has been mellow over this summer. I am reading your lessons every week. I noticed there are several ways to visualize data and summarize it. Those were a nice set of data summary lessons.

Yes. Preliminaries in data analysis — visualize and summarize. I recently came across visuals with cute faces 🙂 I will present them at an appropriate time.

That is cool. On the way back from DC, we played the Chicago dice game. I remembered our conversation about probability while playing.

Interesting. How is the game played?

There will be eleven rounds numbered 2 – 12. In each round, we throw the pair of dice to score the number of the round. For example, if on the first try, I get a 1 and 1, I win a point because my first round score is 2. If I throw any other number other than 2, I don’t win anything. The player with the highest total after 11 rounds wins the game.

I see. So there are 11 outcomes (2 – 12), and you are trying to get the outcome. Do you know the probability distribution of these outcomes?

I believe you just used the question to present a new idea – “probability distribution“. Fine, let me do the Socratic thing here and ask “What is probability distribution“?

It is the distribution of the probability of the outcomes. In your Chicago dice example, you have a random outcome between 2 and 12; 2 if you roll a 1 and 1; 12 if you roll a 6 and 6. Each of these random outcomes has a probability of occurring. If you compute these probabilities and plot them; i.e. distribute the probabilities on a number line, we can see a probability distribution of these random variables.

Let me jump in here. There are 11 possible outcomes. I will tabulate the possibilities.

There are limited ways of achieving an outcome. The likelihood of each outcome will be the ratio of the total ways we can get the number and 36. An outcome 2 can only be achieved if we get a (1,1). Hence the probability of getting 2 in this game is 1/36.

Excellent, now try to plot these probabilities on a scale from 2 to 12.

Looking at the table, I can see the probability will increase as we go up from 2 to 7 and decrease from there till 12.

I like the way you named your axes. X and P(X = x). Your plot shows that there is a spike (which is the probability) for each possible outcome. The probability is 0 for all other outcomes. The spikes should add up to 1. This probability graph is called the probability distribution function f(x) for a discrete random variable.

The function can be integrated to obtain the cumulative distribution function. Say you want to know the probability of getting an outcome less than 4. You can use the cumulative function that is integrated over the outcomes 2 and 3. Just be watchful of the notations. Probability distribution function has a lowercase f, and cumulative distribution function has an uppercase F.

So if we know the function f(x), we can find out the probability of any possible event from it. These outcomes are discrete (2 to 12), and the function is also discrete for every outcome. What if the outcomes are continuous? How does the probability distribution function look if the random variable is continuous where the possibilities are infinite?

Okay, let us do a thought experiment. Imagine there are ten similar apples in a basket. What is the probability of taking any apple at random?

Since there are ten apples, the probability of taking one is 1/10.

What if there are n apples?

Then the probability of taking any one is 1/n. Why do you ask?

What happens to the probability if n is a very large number, i.e. if there are infinite possibilities?

Ah, I see. As n approaches infinity, the probability of seeing any one number approaches 0. So unlike discrete random variables which have a defined probability for each outcome, for continuous random variables P(X = x) = 0. How then, can we come up with a probability distribution function?

Recall how we did frequency plots. We partitioned the space into intervals or groups and recorded the number of observations that fall into each group. For continuous random variables, the proportion of observations in the group approaches the probability of being in the group. For a large n, we can imagine a large number of small intervals like this.

We can approximate this to a smooth curve and define the probability of a continuous variable in an interval a and b.

The extension from the frequency plot to the probability distribution function is clear. Since the function is continuous, if we want the cumulative function, we integrate it like this.

Great. You picked up many things today. Did you figure out the odds of getting a deal on your Chicago dice game — getting the same number as the round in your 11 tries?

If you find this useful, please like, share and subscribe.

You can also follow me on Twitter @realDevineni for updates on new lessons.