Your job is to estimate the number of black cabs that operate at a given time in London. Your team of five has two days to produce an estimate. You must base your answer on data recorded from traffic, and you can collect data for 2 hours on each of the 2 days. The only equipment you can use is pencil and paper.

How would you go about doing this?

(Note to regular readers: today’s post is an experiment to test an idea. I am considering changing the column to “Monday Math” to cover more topics. Puzzles will still appear, just not every week.)

.

.

"All will be well if you use your mind for your decisions, and mind only your decisions." Since 2007, I have devoted my life to sharing the joy of game theory and mathematics. MindYourDecisions now has over 1,000 free articles with no ads thanks to community support! Help out and get early access to posts with a pledge on Patreon. .

.



.

.

.

.

M

I

N

D

.

Y

O

U

R

.

D

E

C

I

S

I

O

N

S

.

M

O

N

D

A

Y

.

P

U

Z

Z

L

E

.

.

.

.

Answer to Number of Black Cabs in London

Here is one method to estimate the number of cabs. One day, set your team in different locations to count the number of black cabs, recording each unique license plate. The next day return to the same locations at the same time and record the same data. The estimate of the number of black cabs will be:

(# cabs day 1)(# cabs day 2)/(# cabs common to both days)

What is the logic of this estimate? I have provided an explanation below.

The source of today’s puzzle is an clip from BBC’s Bang Goes the Theory. Johnny Ball has a wonderful explanation in the following YouTube video. He explains the statistical method using Ping-Pong balls and then applies it to the field of black cabs in London.

Johnny Ball estimates the number of black cabs in London

In the video, they capture 1,827 cabs the first day, then 2,133 the second day, of which 321 were seen on both days. The estimate for the total cabs is therefore (1827)(2133)/321 = 12,140. Johnny Ball goes to someone from the cabbies’ union to find out the actual answer. It turns out the total number of black cabs is much larger, about 23,000. So at first it seemed the estimate was very wrong. However, the estimate was for the cabs in operation at any given time. The line for cabbies during a given shift ranges from 11,000 to 12,000. So the estimate was fairly spot on!

Here is my explanation of why this works. Obviously you cannot count all the cabs in London by hand, there are just too many. So the first day you go out and “capture” a sample of cabs and you “tag” them by recording the license plates. The next day you do the same thing, and some cabs will be seen on both days, that is, they are “re-tagged” or “re-captured.”

The proportion of cabs you see on both days provides the clue for the size of the total number of cabs. If you end up seeing a lot of the same cabs, then the total size will be close to the number you saw on day 1 (in the extreme case, if you tag all of the cabs, then they will all be seen the next day). If the number of cabs seen again is small, then it’s likely the total number of cabs will be bigger than the number in your sample.

If the sample is properly random, then we expect (# re-captured)/(# captured day 2) = (# captured day 1)/(# total). That is, the proportion re-captured on day 2 is the same as the proportion of the total captured on day 1. Re-arranging the formula, we get our estimate for the total as (# captured day 1)(# captured day 2)/(# re-captured).

This method of estimation is known as a capture-recapture sampling method. The method is commonly used in ecology to estimate the population sizes of species and in epidemiology to estimate the prevalence of a disease.