The constructed phrases are similar to reference translations or appear to be alternative ways of saying “wash your hands.” For example, in Bulgarian I predict “умий ръцете,” and Google Translate predicts “Измий си ръцете.” However, if I back-translate my prediction using Google Translate, I still get “wash your hands.” There is some uncertainty where I can’t compare to reference translations (e.g., Pijin [pis] from the Solomon Islands) or human annotated spans, but I can still validate that the word for wash (wasim) and the word for hands (han) are used in other reference documents that are necessarily talking about washing or hands, respectively. About 15% of the translations could be validated using this method, and I hope to validate more as I gather reference dictionaries.

Note, I used at most about 7,000 sentences in each language to get the above translations, even for high resource languages like Italian. I also did not rely on aligned sentences between the language pairs. Despite this very data scarce, unsupervised scenario, I was still able to obtain phrases similar in quality to that of Google Translate for languages supported by both systems. This demonstrates the potential utility of this sort of “hybrid” approach (unsupervised alignment of word embeddings + rule-based matching) for translating short phrases into languages where very little data exists.

Note - I’m definitely not saying that this is a “solution” to the problem of information spread about coronavirus and other health related issues. There are still a lot of things to explore and formally evaluate here, and we are working on that. In many cases, this approach won’t be able to help construct important informational material in hundreds of languages. However, I think that we should all be trying to develop creative solutions to problems related to the current crisis. Maybe this is one piece of a very large puzzle.

You can view the complete list of validated translations plus human translations on this Ethnologue guide page. In addition a more thorough description and analysis of the system in paper form is forthcoming. We welcome feedback from the public on the translations to help fine tune the system and, most of all, to make sure that health information gets out to marginalized language communities around the world!

Develop your own AI skills

There are so many exciting AI problems out there that can make a huge impact in the world! If you want to solve problems like the one above with AI or if you think your business might need to start leveraging AI for other things (supply chain optimization, recommendation, customer service automation, etc.), don’t miss the AI Classroom training event this May. AI Classroom is an immersive, 3 day virtual training event for anyone with at least some programming experience and foundational understanding of mathematics. The training provides a practical baseline for realistic AI development using Python and open source frameworks like TensorFlow and PyTorch. After completing the course, participants will have the confidence to start developing and deploying their own AI solutions.