Carl asked the following discrete-math-style probability question:

$N$ balls are randomly dropped into $k$ boxes $ (k \le N)$. What is the probability that all boxes are filled?

Want to know the answer? Here it is!

Of course, when there is only 1 box, the probability of filling the box with $N$ balls is just $1$. No mystery here.

(Remember, we are assuming that each ball needs to go into exactly one of the boxes — nothing more, nothing less)

So what if $k=2$ (i.e., we have 2 boxes) instead? Well, the probability of filling all the boxes seems intractable, so let’s calculate the probability of not filling all the boxes instead! Let $E_i$ be the event that the ith box is empty, then we have that:

$P(\text{not all boxes are filled}) = P(E_1 \cup E_2)$

Fortunately, due to the simplicity of the case, we also have that:

$P(E_1 \cup E_2) = P(E_1) + P(E_2)$

Now, how can $E_1$ happen? Exactly when all $N$ balls go into box 2 of course! So $P(E_1)=(\dfrac{1}{2})^N = \dfrac{1}{2^N}$.

Similarly, $P(E_2)=\dfrac{1}{2^N}$ as well, which means that: \begin{align*}P (\text{not all boxes are filled}) & = P(E_1) + P(E_2) \\ & = \dfrac{2}{2^N} \\ & = \dfrac{1}{2^{N-1}} \end{align*} So to wrap up, we have that:

$P$ (all boxes are filled) = $ 1 – P$(not all boxes are filled) = $1 – \dfrac{1}{2^{N-1}}$

In practice, what that means is that when $N=2$ (i.e., when we have 2 balls):

$P$ (all boxes are filled) = $1 – \dfrac{1}{2^{2-1}} = \dfrac{1}{2}$

With 3 balls, the same probability becomes $\dfrac{3}{4}$. 4 balls? $\dfrac{7}{8}$. 5 balls? $\dfrac{15}{16}$.

So as is to be expected, the chance of filling all the boxes increases exponentially towards $1$ as the number of balls increases.

Let’s say that we have $3$ boxes and that $E_i$ denotes the event that the ith box is empty, then, proceeding as before, we have that:

$P(\text{not all boxes are filled}) = P(E_1 \cup E_2 \cup E_3 )$

Except that there is no cheating now — meaning that we have to appeal to the so-called inclusion-exclusion principle:

\begin{multline*} P(E_1 \cup E_2 \cup E_3) = P(E_1) + P(E_2) + P(E_3)-P(E_1 \cap E_2) \\ – P(E_1 \cap E_3) – P(E_2 \cap E_3) + P( E_1 \cap E_2 \cap E_3) \end{multline*}

Now, $E_1$ happens precisely when all balls go into box 2 or box 3, so $P(E_1)=(\dfrac{2}{3})^N = \dfrac{2^N}{3^N}$.

All other probabilities can be calculated fairly easily, for example, $E_1 \cap E_2$ happens when all balls go to box 3, so the probability of it would be $(\frac{1}{3})^N = \dfrac{1}{3^N}$.

Putting everything together, we now have that:

\begin{align*} P(E_1 \cup E_2 \cup E_3) & = \frac{2^N}{3^N}+\frac{2^N}{3^N}+\frac{2^N}{3^N}-\frac{1}{3^N}-\frac{1}{3^N}-\frac{1}{3^N} \\ & = \frac {2^N – 1}{3^{N-1}} \end{align*}

And now to wrap up: \begin{align*} P (\text{all boxes are filled}) & = 1 \, – P (\text{not all boxes are filled}) \\ & = 1 \, – \dfrac {2^N – 1}{3^{N-1}} \end{align*}

All right. Let’s put that to test again! When $N=3$ (e.g., we have 3 balls):

$P(\text{all boxes are filled}) = 1 \,- \dfrac {2^3 – 1}{3^{3-1}} = \dfrac{2}{9}$

5 balls? $\dfrac{50}{81}$.

7 balls? $\dfrac{602}{729}$.

10 balls? $\dfrac{18660}{19683}$.

See the same trend? While one might have a hard time filling the boxes with only 3 balls, increasing the number of balls (by perhaps just a bit more) will greatly facilitate this process.

A dorky party idea?

For the generalization to the case where the number of boxes is $k$, the formula will look a bit more involved — with some binomial coefficient and $(-1)^{i+1}$ thrown in in between. Here, we can kickstart with the same reasoning process as usual:

$P(\text{not all boxes are filled}) = P(E_1 \cup \ldots \cup E_k) = \cdots$

(invoking the inclusion-exclusion principle…)

\begin{multline*}P(E_1)+\dots+P(E_k)-P(E_1 \cap E_2) -\ldots-P(E_{k-1} \cap E_k) \\ + P(E_1 \cap E_2 \cap E_3)+ \ldots+P(E_{k-2} \cap E_{k-1} \cap E_k) \\ + \ldots + (-1)^{k+1}P(E_1 \cap \ldots \cap E_k) \end{multline*}

(and after some lengthy number crunching…)

(and some intense parsing…)

$\displaystyle k(\frac{k-1}{k})^N-\binom{k}{2}(\frac{k-2}{k})^N+\binom{k}{3}(\frac{k-3}{k})^N-\ldots+(-1) ^k \binom{k}{k-1}(\frac{1}{k})^N$

(converting into sigma notation now…)

\[ \sum_{i=1}^{k-1} (-1)^{i+1}\binom{k}{i}(\frac{k-i}{k})^N \]

Lastly, putting everything together, we get that:

\begin{align*} P (\text{all boxes are filled}) & = 1-P(\text{not all boxes are filled}) \\ & = \displaystyle 1-\sum_{i=1}^{k-1} (-1)^{i+1}\binom{k}{i}(\frac{k-i}{k})^N \end{align*}

And that’s about it! Finito! 🙂