Building technology that works across languages is important: without a keyboard tailored to your language, simple things like messaging friends or family can be a challenge. Often, keyboard apps don’t support the characters and scripts used for languages with a smaller speaking population. As an example, the Nigerian language "Ásụ̀sụ̀ Ị̀gbò" is impossible to type on an English keyboard. Plus, wouldn't it be frustrating to see nearly every word you type incorrectly autocorrected into another language?

Many of Gboard’s newly added languages are traditionally not widely written, such as in newspapers or books, so they’re rarely found online. But as we spend more time on our phones on messaging apps and social media, people are now typing in these languages more than ever. The ability to easily type in these languages lets people communicate with others in the language they would normally speak face-to-face as well.

How we add new languages to Gboard

In addition to designing a new keyboard layout, every time a new language is added to Gboard we create a new machine learning language model. This model trains Gboard to know when and how to autocorrect your typing, or to predict your next word. For languages like English, which has only about 30 characters and large amounts of written materials widely available, this is easy. For many of the world's languages, though, this process is much harder.

In order to train our machine learning language models, we need a text corpus (which is a database of lots of available texts written in a particular language). Often, finding text data in these languages can be challenging. When we can’t find data online, we’ll share a list of writing prompts with native speakers, so we can create new text corpora from scratch. (You can read more about our crawling efforts for these languages in one of our recent research papers.)

Next, we focus on the layout design. Layout design for a new language on Gboard requires careful investigation and research to fit in all the characters in a way that makes sense to native speakers. If there isn’t a lot of information for the language available online, we'll analyze text corpora to figure out which characters to include and to determine how frequently they’re used.

Depending on the language, we may tailor aspects of the layout, like the set of digits—for example, while English uses 0123456789, Hindi and other Indian languages written in Devanagari use ०१२३४५६७८९. Once we've built support for a language, we always invite a group of native speakers to test and fill out a survey to understand their typing experience.

To see if your language is already supported in our latest Gboard release in the Play Store, check out the list of supported languages in our help center.