I’ve never played the lottery before. Going through the numbers it was really interesting seeing the changes in the rules overtime showing up in the data.

The basic rules of Powerball are as follows. It costs $2. You choose 5 numbers, and then a powerball number. You can pay a dollar extra for the multiplier option, which I’ll cover later.

This is the payout table:

Match Prize All 5 + Powerball Grand Prize All 5 $ 1,000,000.00 4 of 5 + Powerball $ 50,000.00 4 of 5 $ 100.00 3 of 5 + Powerball $ 100.00 3 of 5 $ 7.00 2 of 5 + Powerball $ 7.00 1 of 5 + Powerball $ 4.00 Powerball $ 4.00

They draw twice a week on Wednesday and Saturday. The dataset contained 683 draws.

Here is the distribution of the first 5 numbers. They range from 1 to 69, but 60-69 were only introduced in October of 2015.

The numbers seem fairly uniform if you ignore the newest numbers (60-69). Here are the top ten most common numbers called:

Value Times Called 11 70 52 68 12 67 23 67 39 66 41 65 10 64 14 64 32 64 40 64

Now, let’s see if our highest number, 11, is statistically significant.

Using a significance level of 0.05, a null hypothesis of 1/69, and an alternative hypothesis of greater than 1/69 does give us a significant p-value of 0.0066.

We can interpret this as a 0.0066 probability of getting results as extreme as our dataset given that the null hypothesis is true.

The obvious problem with this is that our null value of 1/69 is only true for the past year. If we change the null value to 1/59 we get an insignificant p-value of 0.0717. Pretty cool, nonetheless.

Now onto the powerball numbers! Currently, they range from 1-26. But, they used to range from 1-35, and for a brief time (5/12/2010-12/07/2011) numbers up to 39 show up in the dataset.

We can see the top ten frequency in table form below:

Value Times Called 29 27 25 24 5 23 6 23 17 23 24 23 11 22 12 22 18 22 33 22

Our highest frequency number that can currently be played is 25. With a null value equal to 1/26 and our alternative set to greater than 1/26 we get a p-value of 0.68. So nothing even close to being significant there.

It would be possible to cut up the dataset by date, but given the high p-value I don’t think it would yield anything significant no matter how it’s diced up.

Lastly, the multiplier. For $1 more you have the option of the multiplier. This does what it says, multiplies your winnings. It was discontinued for two years (1/18/12 – 1/18/14) and then reinstated.

There’s a clear pattern to this. The 10x multiplier was only used once in the dataset on 6/12/2010. Currently, the 10x is available, but only applicable on the bottom 7 prize levels and the pool must be $150 million or less. (but it hasn’t popped up in over 6 years)

To close, I think it goes without saying that all of this is mostly just for fun. There are some obvious problems with the statistical tests since the rules and probabilities have changed at least 3 times over the past 6 years. Furthermore, although possible, I strongly doubt the lotto drawings are biased for or against certain numbers.

So, that’s it. The data can be found here. The powerball website itself is also an interesting relic from the past, but it’s still being maintained despite the dated look.

Thanks for reading!