When API Is Not Enough

Human-Friendly Tools for Machine Intelligence

TL;DR

Machine Translation services are API-first, which makes them useless unless you either have an integration or write code. However, the engineering resource is scarce, while the lack of integrations with existing systems is the single biggest hurdle in using AI.

Not everyone makes best friends with APIs.

At Intento, we started with reducing the integration complexity by introducing a single API middleware to poke all Machine Translation vendors in June 2017. Then, this summer, we released command-line tools for Machine Translation. Here is our next step — simple, beautiful and human-friendly web tools to play with Machine Translation at scale.

Does Machine Translation work for you?

This question triggers very diverse responses. For some of the Language Service Providers (LSPs), MT provides up to 40% reduction of human translation efforts and halves the turnaround, automating mundane job for human translators. For others, it does not work at all and only makes things worse: same effort, same time to market, frustrated translators who have to edit crappy MT output.

There’s a vast difference in the performance of MT engines across different language pairs even for General/News domain, as we demonstrate in our evaluation reports. For specific domains, such as Legal or Medical, there’s even larger spread.

Moreover, the latest NMT engines possess some of the human features: they may have bad mood (think 5xx API errors) unpredictable quirks, not working on sentences with specific words, of a specific structure or exceeding a certain length.

And quirks mean quirks: among of the 20+ engines we work with there’s one which does not translate anything about Jedis, another one which drops keywords in SEO texts and yet another one which thinks “hello world” is untranslatable.

Choosing the right MT

It’s hugely important to evaluate all available MT engines when deciding which one to use for a specific project — and if to use at all. For some cases, it even makes sense to combine multiple MT engines in one project and let the translator choose.

Different projects need different engines (from https://bit.ly/mt_jul2018 )

Today, all those require significant effort, which is beyond the budget of a typical translation project:

Proper evaluation should be done at scale, while MT vendors provide too simplistic web demo apps. Hence evaluating them at scale requires writing code to use their APIs.

Connectors in the CAT and TMS software are often outdated, lacking the support of the most recent technology. There’s a workaround: to pre-translate the whole project and import it as a Translation Memory, but that also requires something beyond just an API.

Most of the CAT tools provide a single MT match, making it impossible to combine several MT in one project. This may also be solved by importing one TMX file per MT engine, but this also requires convenient tools.

Crossing the Tools Chasm

The first step — Command-Line Interface

We learned most of those issues by talking to LSP clients and launched a CLI for our API. It provided simple but important features:

Get a list of MT engines that support a given language pair and file format Translate files of any size, unifying authentication and bulk request processing.

However, the CLI still requires some engineering skills and onboarding. Hence goes the next step.

Intento Web Tools

We have built a lightweight web application that uses the Intento API. It streamlines translating files with multiple MT engines, removing most of the friction in comparing them for a specific text and translating large files.

Currently, it works with formats supported by most of the MT engines: plain text, html and xml. We plan to add support for more formats on our side later on.

Once the file is uploaded, you should set the language pair and additional options. The file format defines if we pass the format option to the selected MT engine and also the file extension of the translation results. Soon there will be TMX and XLIFF support.

Also, we have a set of pre- and post-processing rules that fix punctuation errors produced by some MT engines.

The language pair and text format define a list of the MT engines available for this translation:

There is a little trick — some of the engines use “xml” for “text with tags” instead of “html”, hence it’s better to try both.

For every engine, we support passing user credentials and user-defined model id (for those engines that support custom models):

After the models are selected and the translation is run, you see a translation job in progress. Once it’s over, you may download the translation results.

Then, you may either explore the translations to see what works better or just convert them into TMX and import in your CAT tool to let the translator decide.

Some hidden gems

As the web application works over the Intento API, it provides the same benefits:

It is integrated with every Cloud MT engine, and with some on-premise as well (Systran PNMT, SDL ETS).

It provides the up-to-date information on the MT engines that support a specific language pair, unifying their language codes and format names.

It processes arbitrary large text files by segmenting and batching them according to limits and restrictions of every MT engine. This way, it completes bulk translations 10–20 times faster than naive API integrations that work segment-by-segment.

It supports custom models for all MT engines that provide this feature.

How to start using it?