I first read the following wonderful little puzzle on Mathoverflow, where it was posted by David Futer. I don’t know who invented it or where it appeared first – if you have any information, please let me know. (Note: Toshihiro Shimizu points out the following link: AoPS)

There are 99 bags, each containing some number of oranges and some number of apples (any of these numbers can be zero). Show that we can always select 50 bags out of the 99, such that these 50 bags together will contain at least half of all the apples and at least half of all the oranges.

I like this puzzle because there are many different ways to solve it. Before reading on, do yourself a favor and try solving it yourself – it is not difficult, but very rewarding. If you find a proof that is different from those below, let me know and I’ll add it to the list.

——

Why look for different solutions, once we already have one? Partly because it’s fun, partly because each solution might give some new insight into the problem, partly because we can showcase different mathematical techniques, and perhaps most importantly, because each new proof lends itself to generalization slightly differently – so a new proof might lead to new results. The proofs below are sketchy in places, in order to keep them short – it can be instructive to try to fill in the gaps, or to try coming up with the proof after reading just the name of the technique.

Sometimes proving a more general statement is easier (at least psychologically – by removing distracting elements). In our case, it will be more “fruitful” to prove the following statement:

Given \(n\) bags that contain oranges and apples, we can select \( \lceil \frac{n+1}{2} \rceil \) bags such that they contain at least half of each type of fruit. (we only really care about the case when \(n\) is odd)

Here are all the proofs that I am aware of.

Proof 1. (sorting) (source: I vaguely recall seeing this proof on a math competition website but I can’t find it anymore. If you have any information, please let me know see AoPS link above).

Sort the \(n\) bags in decreasing order of the number of apples (break ties arbitrarily). Select the first bag, and from each following consecutive pair select the one with more oranges (break ties arbitrarily).

It is easy to see that the selected bags satisfy the required properties.

Proof 2. (induction) (source: idea from David Futer).

First we show that for \( n=3 \) the statement is true. Indeed, we can select the bag with the most apples and the one with the most oranges (if the two are the same, we also pick another bag).

Assume that the statement is true for \(n-2\). We then prove it for \(n\):

Consider a “good split” of the \(n-2\) bags into A and B, such that A has at least half of both fruits in A+B (such a split exists by the induction assumption). Now we try adding the other two bags, x and y.

If A has as many of both fruits as B+x or as B+y, we are done (we add the other bag to A).

Otherwise if B+x or B+y have as many of both fruits as A, we are done (we add both bags to B).

Otherwise if A has less apples than B+x and less apples than B+y we are done (add to A the one of x and y with more apples). Symmetric argument for oranges.

Otherwise if A has less apples than B+x and less oranges than B+y, it means that A has less of both fruits than B+x+y, which we treated already. Symmetric argument after swapping the terms “apples” and “oranges”. There are no other cases.

Proof 3. (contradiction or “canonical counter-example”)

Suppose there is no solution, then consider a set of \( \frac{n+1}{2} \) bags which has at least half of the apples (there is always such a set), and among such sets the one with the most oranges possible.

Let this set be A, and the remaining set B (A has at least half of apples and B has more than half of oranges).

Take the bag from A with the least oranges and move it to B. Now take the bag from B with the most oranges and move it to A. If we just moved the same bag back, it means that all bags originally in B have at most as many oranges as the bag with the least oranges in A. This contradicts the fact that B has more than half of all oranges.

Otherwise there are two cases: 1) the number of apples in A drops below half when we move a bag, which means that B becomes a solution, or 2) the number of apples in A remains at least half throughout, then after the swap we increased the number of oranges in A, which contradicts the statement that the chosen set had the highest such value.

Proof 4. (counting) (source: I read this proof written in probabilistic form by Erez Druk, after being asked by Dror Ozeri on a puzzle forum – as it is a closed forum, I cannot link to it)

I write this and the next proof in the n=99 case only, to avoid excessive notation. The number of 50-sets of bags is the same as the number of 49-sets \({99 \choose 49}\). Denote the number of 50-sets that contain at least half of all oranges by \(N_{50}\), and the number of 49-sets that contain at least half of all oranges by \(N_{49}\). We have \(N_{50}>N_{49}\), unless none of the bags contain any oranges, which is an easy case anyway (there is one-to-one mapping from 50-sets to 49-sets, such that each 50-set contains its mapped 49-set). Clearly, more than half of all 50-sets contain at least half of all oranges (otherwise the previous inequality would be violated: if an 50-set does not contain at least half of all oranges, then its complementing 49-set does).

The same argument holds for apples. As more than half of the 50-sets contain at least half of the oranges, and more than half of the 50-sets contain at least half of the apples, there is at least one 50-set containing at least half of both fruits.

Proof 5. (price function) (source: from David Futer)

Assign price \(x\) to an orange, and \(1-x\) to an apple. Select the 50 most expensive bags.

Denote:

\(f(x) = (\mbox{# of oranges in top 50}) / (\mbox{total # of oranges})\)

\(g(x) = (\mbox{# of apples in top 50}) / (\mbox{total # of apples})\)

We have \(g(0) \geq 1/2, f(1) \geq 1/2\), and \(f\) is increasing, \(g\) is decreasing. Clearly there is some \(x\) for which both \(f\) and \(g\) are at least 1/2, which denotes a solution (if for some \(x\) both functions would have values below 1/2, then the top 50 would not be the top 50, as its complement would have more apples and more oranges, therefore higher total price).

Proof 6. (geometry – Ham-Sandwich) (source: from David Futer, and a variant explained by Dror Ozeri on the same forum)

We represent the bags as sets in the plane, with the restriction that a straight line can cross at most two of them (this can always be achieved). We somehow encode the number of oranges and apples in each bag. There are more possible variants here: map each fruit to a red or blue point (depending on its type); or map all oranges of a bag to a single point with a weight that tells the number of oranges, and similarly for all apples of a bag; or map the whole bag to a single point for which we define an “apple-measure”, and an “orange-measure”. Now we apply the Ham sandwich theorem to obtain a balanced cut (there are different variants of this theorem, we choose the one suitable for our representation). If any “bags” are cut by the obtained line, we move them to the side which has less bags. This gives a valid solution.

Exercises:

Prove the following more general statement. Which proof technique is suitable?

Given \( n\) bags that contain \( f\) kinds of fruits, we can select \( \lceil \frac{n+f-1}{2} \rceil \) bags that contain at least half of all kinds of fruits.

Prove the following variant. Which proof technique is suitable? Given 100 bags that contain apples and oranges, we can select 34 bags that contain at least one third of the apples and one third of the oranges.

Find a general statement with \(n\) bags, \(f\) kinds of fruits and finding a subset of the bags that contains at least a \(p\)-fraction of the fruits of each kind, for arbitrary \( 0 \le p \le 1\).

Assume that we know the contents of each bag. Which proof leads to an efficient algorithm for finding a good subset of bags in the sense described above? In particular, which proof leads to a linear time \(( O(n) )\) algorithm?

Can you come up with new proofs for the puzzle? If so, I would love to hear them. Do any of the proofs admit new generalizations?

Thanks to David Futer, Lavinia Dinu, Manuel Arora, and Tobias Mömke for interesting discussions about this puzzle.