Many Europeans are multilingual by necessity – how does that affect which search provider they choose? Superior natural language understanding technology could be behind Google’s enormous success in Europe.

The European Parliament today called on the European Commission to consider breaking up Google to allow for competition in the web search market within Europe (and well, beyond).[1] 91.6 % of Europe’s searches are done through Google.[2]

I’ve noted over the last two weeks that technology news media have attributed Google’s success to “Europeans loving Google”. I haven’t seem anyone try to explain why Google has become so popular. So I thought I’d offer my own theory with you guys.

The European Union has 23 official languages. Across Europe, there are more than 60 living languages. By necessity and from a strong focus on language education, Europeans are multilingual. “Just over half of Europeans (54 %) can hold a conversation in at least one additional language, a quarter (25 %) can speak at least two additional languages, and one in ten (10 %) are conversant in at least three.” [3]

I believe the secret to Google’s enormous market share in Europe comes from their technology’s better understanding of a multitude of languages. This post will focus on the problem from a Norwegian’s perspective, but I do believe these experiences are representative of many other regions and language families.

Searching for “tøy” in Bing.com’s Norwegian language cousin Bing.no; will return results for the Taiwanese actress Jarinporn “Toey” Joonkiat instead of shopping options and information for the Norwegian word for clothing. Google and Yandex doesn’t make this cultural mistake. The advertisements shown alongside the results in Bing does, however, include results related to clothes. I’m not sure exactly what this says about Bing’s priorities when it comes to operating a search engine for the Norwegian market.

The above could be explained by the Scandinavian letter “ø” often being represented as or normalized to “oe” in ASCII. It’s reasonable for Bing to normalize input before processing the user’s query. However, even if the user understands all these technicalities, the user will still already have moved to the search-engine of a competitor who understand what they are trying to say.

Did I mention that Norwegian has two different written languages? There’s only one spoken language and every Norwegian should be able to understand both written languages without too much brain-processing overhead. Google applies some lingual magic to normalize search queries given in either variant and returns the best resource with only a small ranking adjustment to prefer the language variant the searcher used.

Bing treats the same query in the two language variants verbatim and returns entirely different results with no regard to the meaning of the query nor the ability of the searcher to understand both language variants.

Most Norwegians can read Swedish and Danish just fine as the languages are closely related. Showing search results in these languages is an acceptable fallback when there’s no higher valued result available on the topic in Norwegian.

The article on the Swedish capital Stockholm is by magnitude better in the Swedish language Wikipedia than in the English or Norwegian language Wikipedia. On the other hand, Swedes have more trouble understanding Norwegian than Danish.

A search-engine would run the risk that the user they display that result would not be confident in the other continental Scandinavian languages. Another less technical but more politically complicated example is that of British and American English.

Weighting lingual parameters are thus very tricky and the success of this varies from person-to-person. Personalizing search results to the individual is understandably a big part of Google’s recipe for running a search engine.

It gets further complicated by the multilingual nature of Europe. I search for queries in both English and Norwegian every day. I even throw in some Danish and German searches on occasions.

Every search engine, including Google, seem to assume that searches are always performed in one language. Google does a much better job at handling multilingual search queries than the competition; who all seem to not even be trying.

You can also clearly see this in Google’s Search and Translator services when compared to Bing Search and Translate. Google can translate Danish to Norwegian (written Norwegian is based on 1800’s Danish) pretty much flawlessly.

Bing Translate echoes the Danish text back with the odd word here and there randomly replaced. Google Translate can translate Norwegian–English–Norwegian–English–Norwegian and you will have an understandable text at every step. Bing Translator currently offers 44 languages compared to Google Translate’s 66 languages.

My own experiences show that Google Translate can turn languages I can’t even identify into meaningful text were as Bing Translator leaves me guessing at even the overall meaning of a translated text.

Google’s Chrome web browser’s 47 % market share in Europe [4] can likely trace its success to its built-in auto-document-translation feature that detects languages automatically and unlocks any page’s knowledge to the user.

Work immigrants have to deal with their native tongue, the language of the country they are in, and they are likely to master some level of English as well. A browser that can fill in knowledge gaps with automatic translations is a natural first choice.

There are some local-language specific search-engines in a few European countries. These aren’t popular and only have a limited reach as they are restricted to a single region and language.

Most of these are skinned versions of Google, though there are a few exceptions that try to maintain their own cache. A proposition that becomes more expensive and less technically feasible every day as the Internet grows. Being limited to one language and without offering Google’s “Translate this page” feature either in the browser or next to the search results

It is inevitable in a market of multilingual users that the search service provider with the best human language understanding will be everyone’s preferred provider.

To truly reduce Google’s advantage in the European market, the EU doesn’t need to “unbundle search from other commercial services”.[5] I would suggest they need only to untangle Google’s language processing smarts from the search service to level the playing field. Given the EU’s capabilities to land on the surface of a fast-moving comet[6], regulating Google should be a relatively small challenge.