As I was watching an episode of Only Connect the other day, I learned an (almost) interesting fact about London tube stations:

St. John’s Wood is the only tube station not to share any letters with the word mackerel Someone, for some reason, some time.

This was such a delightfully pointless fact, that my curiosity was instantly piqued. Why this station? That word? Is it unique, or unusual for this to happen? Let’s, dear reader, find out for ourselves.

To get to the bottom of this, I first found a few useful sources of data:

370,099 words in the English language, arranged alphabetically: https://github.com/dwyl/english-words/blob/master/words_alpha.txt

The top 10,000 most frequent words in the English language, sorted by frequency of usage in written text: https://github.com/first20hours/google-10000-english/blob/master/google-10000-english.txt

A list of London underground and DLR station names: https://wiki.openstreetmap.org/wiki/List_of_London_Underground_stations

The next step was to go through all of these words, and find what letters they have in common with the tube station names. There are, after removing duplicates, approximately 122 million combinations to consider.

I was a bit worried that doing this naively in python would be too slow, and that writing in a compiled language on my Windows machine would be too painful, so I just did it naively in node, which ended up running in about a second – quite pleasantly surprising.

One of the small optimisations which probably helped was pre-computing the occurrences of letters in tube station names, i.e. creating the matrix which is true/false if letter is in/not in tube station , e.g.

const letters = new Array(tubesUnique.length).fill(0).map(l => new Array(26).fill(false)); tubesUnique.forEach((t, i) => { for (let tind = 0; tind < t.length; ++tind) { const ccode = t.charCodeAt(tind) - 97; // So lowercase 'a' -> 0 etc. letters[i][ccode] = true; } });

The full script is on GitHub for those interested.

Results

So – is the combination of mackerel and St. John’s Wood special and unique?

Not at all.

There are 57,614 combinations of words and tube stations which fulfil the above condition for the large word list, and 931 for the smaller.

The number of ‘hits’ per station – i.e. the number of different words for which that station is the only one not to share any letters – is plotted below (click to enlarge).

It is clear that St. John’s Wood is not special at all, perhaps in the top quartile. The most fertile name amongst tube stations is the unassuming Woodford.

Words where Woodford is the only station with no letters in common

Most common word – language

Least common word – cleanup

Shortest word – alenu/ulnae (5), or more common, bleach (6)

Longest word – intellectualistically (21)

At the other end of the scale, there are a few stations where only one word in the English language fits:

Whitechapel – borborygmus

Old Street – pacifying

Turnham Green – slipbody

St. James’s Park – hobblingly

Boston Manor – childlike

Moorgate – whilikins

South Acton – bewilderedly

These words aren’t exactly all in common use, so looking at the smaller subset of words gives a different set of stations:

Maida Vale – thrown

Chesham – burlington

Surrey Quays – clothing

Goodge Street – painful

Oval – Edinburgh

Highgate – compounds

Baker Street – immunology

Brixton – valued

Cockfosters – individually

Kenton – hydraulic

Then, for fun, let’s look at a few other words we can use with St. John’s Wood (there are 1,006 choices):

Most common word – player

Least common word – fireplace

Shortest word – clear (5)

Longest word – pluricarpellary (15)

Finally, let’s look at some aggregate statistics. Here’s how word length affects the number of tube stations with letters in common – we are interested in the ‘orange’ words:

The longest word for which this works is philosophophicopsychological for the station Debden. Slightly shorter pairs include microspectrophotometric for Bank, and counterproductiveness for Balham.

The distribution of word lengths producing a single ‘hit’ is

and normalising for the distribution of word lengths in English,

So this trick works best for longish words around 10 letters long, where it becomes quite probable – over a fifth of words of that length work.

If someone ever trots out this fact, or perhaps asks it at a pub quiz, you can rest safe in the knowledge that it is not special in any way, and neither is any of this, or anything else we do to while away our Sunday afternoons.

Happy spelling!