Stop Guessing Languages Based on IP Address

There’s better ways to do this already built into HTTP.

Recently I took a trip to Europe. When I arrived in Berlin, I opened up my laptop and went to Google. Much to my surprise, despite being logged in, my results came back in German.

Other large-scale sites I went to suddenly gave me their German counterparts. I got redirected from .com to .de and had to look for a “Switch to English” link or try to run a translator on the site.

Since some of the largest Internet properties were doing this, I must have missed the memo. Every request that goes out from the browser specifies what language I’d like to have things in. For me it’s

Accept-Language: “en-US,en;q=0.5”

My browser is so persistent, it eagerly sends this off for things like JPGs and Javascript files. Despite this, it’s completely ignored and instead my Berlin IP address was used to assign me a language.

MDN attempts to describe why Accept-Language ought to be ignored. The reasons are readily fixable given that the consequences are knowingly risking the displaying of indecipherable content — and in the modern world of web-tracking, to someone you should know you’re switching languages on.

Instead, Accept-Language should be used and the browser should provide appropriate methods at relevant times for specifying it.

Currently there are ways to specify Accept-Language in the major browsers, but almost nobody does it, knows about it, and leaves it as the language of their browser’s interface. The net result leads to some site analytics data that looks like this:

Accept-Language values for a site in various cities in India. Under 1% of all Indian users had Hindi in their language list. (India’s English proficiency index, for reference)

That is a UX failure, not an engineering one. That’s a shame because Accept-Language is likely more powerful than you realize.

Multilingual Fidelity

Something you quickly discover while traveling outside the US is many have different levels of skill with multiple languages.

Wouldn’t it be nice to have a mechanism for someone to express they have say, fluency in English, are pretty good at Spanish, and know a bit of French?

Accept-Language has been able to do this since the days of GeoCities and Lycos. From HTTP/1.1:

Each language-range MAY be given an associated quality value which represents an estimate of the user’s preference for the languages specified by that range. The quality value defaults to “q=1”. For example, Accept-Language: da, en-gb;q=0.8, en;q=0.7 would mean: “I prefer Danish, but will accept British English and other types of English.”

Improving on the existing user-interface that expresses this fidelity through a wizard and survey interface is fairly straight forward. Here’s an ugly mockup:

A somewhat-friendly interface to fill out the Accept-Language value

Since knowledge-of and preference-to are conceptually distinct (you could know two languages equally but prefer one), one could elaborate this interface by asking the user to pick their preferred languages first and then you could fine tune the q-values based on that order. Another option would be to ask the user to choose a favorite if they claimed to be “natively/excellent” in multiple languages.

However, that may be over-engineering and asking too much of the user. The goal here is to get people to use the thing.

Finding an Appropriate Time to Ask

When it comes to nagging users there’s only two kinds of time: helpful and irritating.

A helpful time to ask the user about preferred languages might be when the “Translate this page?” feature comes up. Chrome does this in two, arguably insufficient (given the in-practice convergence) ways:

A translation prompt from xinhuanet.com

By selecting “never”, the language is added to the Accept-Language list with a q=0.6

Alternatively, after clicking on “Options”, there is now a link to the Language Settings dialog

One notable problem with this approach is that if the user knows, say, English but prefers Hindi, they will likely either never get this dialog or dismiss it because they don’t require translation. The fidelity of their language ability and their personal preference isn’t captured because that’s not the goal the flow expresses. It inquires about necessity and not preference.

In less sophisticated browsers, various heuristics such as seeing that different TLDs from the ones associated with the Accept-Language value are often being used is a readily programmable way for popping up an interface that asks “Your current language preferences are X. Would you like to add Y?”

There can also be a “quick-switch” solution modeled after the “text-encoding” interface which has been in browsers for literally decades. Imagine the following with a “Preferred Language” menu item adjacent to the drop-down option:

“Preferred Languages” would likely be more useful than this

This would allow people on not-their-own devices to specify their preferred language in a lower-fidelity but quick and universal way.

Imagine if you could show someone something on your phone and the person could in a few taps switch to a language they were more familiar with and then in the weaker shared language say, “This is up the road and to the left”.

Although there are plugins that have this interface (chrome, firefox), the real value is as a predictable shared interface. It’s best as a built-in feature as the common use-case is when interfacing devices that are not our own.

We are the industry and we can make this happen on the cheap. The hard work of getting the browsers to send out a standard header is already done for us. We just haven’t done the easy parts yet — what a shame.

Follow the Standard. There’s just one this time.

By ignoring this built-in expressive feature for choosing languages, everyone instead spends time designing and building a language selecting interface on essentially every website.

Placing flag icons (or better) in different locations each time that everyone has to fish around the page for or assigning a language based on IP is time-consuming work and unnecessary cleverness. In practice, it’s a guess-and-surprise anti-pattern (assigning based on weak correlation) due to a poorly instrumented smart-interface.

In a world where arguably too many words are spent over proper usages of HTTP verbs, every programmer home-brewing their own solution to a long-solved critical user-facing issue seems to be completely overlooked.

This Problem only Grows

In the near future, not only is international migration set to increase, in many regions getting online there’s a local, national, and international language (ex: Gujarati, Hindi and English for the 60 million people of Gujarat, India).

In other places, such as the New York metro area, at least 192 different languages are spoken at home. In Los Angeles, 54% of homes use languages other than English. These numbers will continue to rise. That’s why getting users to state their preferences is important.

By leveraging what’s already there, we’ll help move the web forward so the next billion web users will have content as accessible and meaningful to them as the first billion.

Thanks, let’s get to work.