Twitter map of London shows the linguistic diversity of a truly international city

92.5 per cent of tweets sent in English, but other major languages include Spanish, French, Turkish and Arabic


It's long been hailed as one of the great international cities, but now London's linguistic diversity has been mapped thanks to Twitter.

This colour-coded graphic pinpoints the location and language of tweets sent from the British capital and shows how linguistic groups are clustered in the city's various districts.

The revealing map is the work of University College London PhD student Ed Manley and James Cheshire, a lecturer at UCL's Centre of Advanced Spatial Analysis.

Global city: This map shows the language distribution of tweets across London over this summer

Using an open-source website language detector, the pair detected the predominant language used in 3.3million GPS-enabled tweets sent over the summer.

Perhaps unsurprisingly, 92.5 per cent of the tweets were sent in English, but the researchers detected a total of 66 languages among the data including tongues as esoteric as Haitian Creole, Basque and even Swahili.

The map shows how English tweets, shown in grey, predominate and provide crisp outlines to roads and train lines as people tweet while moving around the city, Professor Cheshire explains.

The city's beating heart: Most of London's linguistic diversity is situated in the city centre, but includes multilingual hot spots like the Olympic Park and a bizarre concentration of French speakers in Lewisham, bottom right

In the city's north more Turkish tweets, shown in blue, appear; Arabic tweets - appropriately in green - are most common around Edgware Road; and parts of the West End show pockets of Russian tweets in pink.

Professor Cheshire explains on his blog : 'The geography of the French tweets (red) is perhaps most surprising as they appear to exist in high density pockets around the centre and don’t stand out in South Kensington (an area with the Institut Francais, a French High School and the French Embassy).

'It may be that as a proportion of tweeters in this area they are small so they don’t stand out, or it could be that there are prolific tweeters (or bots) in the highly concentrated areas.'

Mr Manley told Metro the project revealed a few matches but 'a lot of the time it didn't actually match in the same volume as we expected'.

On his blog he points out that languages he had expected to feature prominently like Bengali and Somali barely appear on the map.

'Either people only tweet in English, or usage of Twitter varies significantly among language groups in London,' he speculates.

To create the map Mr Manley fed a Twitter dataset containing the tweets through the Chromium Compact Language Detector - a open-source Python library adapted from the Google Chrome algorithm which detects a website's language.

The approach was not entirely reliable. Around 1.4million tweets had to be discarded because the language could not be determined, while it showed a surprising number sent in Tagalog - a language of the Philippines.

A closer look revealed that many of these exotic tweets included uses of English online colloquialisms like 'hahahahaha', 'ahhhhhhh' and 'lololololol'.