Google translate become the co author of an article I was writing. I did not invite it to join in the process, it changed the writing in ways I am no longer sure of, sometimes for the worse but maybe also for the better.

I was making some edits to a piece and checking some source material that was in another language in another window. I used Google translate to translate some text from Portuguese to English. It appeared that the functionality was still active in the window I was editing but I sort of ignored it. It asked if I would help to improve the translation. As I was writing in English and the original was in English I considered this as just another bit of buggy functionality, not an invitation to collaborate.

Someone commented on the name of a pub I mentioned being wrong. The pub was the Plough, it appeared in the published article as the Plow. That was not the only change. Some were minor and distinct others were quite dramatic and general. Using the vectors in the Neural Machine Translation system that Google translate employs gives interesting and sometimes subtle changes.

I was writing directly into Medium, which does not reveal version history so the exact nature of the interventions, that Google Translate made, remains unknown to me. I corrected the obvious errors and reversed some changes. I found sections of garbled text and removed them. However I would like to think that some alterations improved the text.

What Google translate did was to translate Plough to Arado and then translate Arado to Plow. Further in the article I noticed further errors, ‘he did’, changed to ‘I did’ speech marks changed. Capitalisation changed

artists but also the odd; academic, critic and curator.

became

artists but also the odd; Academic, critic and curator.

… the syntactic structure of some of the text changed. The following paragraph changed from

John Murphy’s piece consisted of four pieces of string, one in each corner of the small room that comprised the gallery space. John had some of his manuscript books laying around. These were books of loose musical staff that he had had bound into hard back books. I did own one of these for some time. I am not sure what happened to the string, I think longer pieces were installed at the home of Jack and Nell.

though

A peça de John Murphy consistia de quatro pedaços de corda, um em cada canto da pequena sala que compreendia o espaço da galeria. John tinha alguns de seus livros manuscritos por aí. Estes eram livros de pessoal musical solto que ele tinha ligado em livros duros de volta. Eu tive um destes por algum tempo. Não tenho certeza do que aconteceu com a corda, acho que peças mais longas foram instaladas na casa de Jack e Nell.

to

John Murphy’s play consisted of four pieces of string, one in each corner of the small room that comprised the gallery space. John had some of his manuscript books out there. These were loose musical staff books he had hooked up in hard back books. I had one of these for awhile. I’m not sure what happened to the rope, I think longer pieces were installed at Jack and Nell’s house.

What makes this all the more interesting is that one of the works I mentioned in the article was based on ever so slight changes to a piece of text that appeared on the a panel on one wall in a gallery and an apparent duplication on a similar panel in another room and a contrasting meaning on similar looking text in the catalogue that accompanied the exhibition.

I showed a series of collages comprising passages of text relating to the cultural significance of the Gallery alongside a floorplan of the rooms.

The two Upstairs Galleries in this beautiful Nash building were symmetrical, so the collages were apparently repeated, the distinction was the meaning of the texts, while appearing the same they were in fact direct contradictions of each other. The exhibition catalog also had a third version of the text, again apparently the same but contradicting the other two. By changing a verb or an adjective here and there the catalogue further contradicted the texts hung on the walls.

How the translation engine works

Google Translate is now quite different than it was a few months ago, On Tuesday, September 27, 2016 a new model was introduced and deployed. This used recurrent neural networks (RNNs) which are Long Short-Term Memory (LSTM) RNNs. They have have 8 layers, with residual connections between layers. They also employ parallelism connecting the attention from the bottom layer of the decoder network to the top layer of the encoder network.

Google翻譯現在與幾個月前不同，2016年9月27日星期二，一個新的模型被引入和部署。這使用作為長短期記憶（LSTM）RNN的複發性神經網絡（RNN）。它們有8層，在層之間具有殘留連接。它們還採用將解碼器網絡底層的注意力連接到編碼器網絡的頂層的並行性。

or as my new co-author would put it

A new model was introduced and deployed Tuesday, September 27, 2016, unlike a few months ago. This uses a recurrent neural network (RNN) as a long-short term memory (LSTM) RNN. They have 8 layers with residual connections between the layers. They also employ the parallelism of attaching the attention of the decoder network floor to the top layer of the encoder network.

I like the way they display a certain modesty by omitting their name. The Google Translate team experiments suggest that the quality of their translation system is close to that of average human translators.

The animation ‘shows the progression of GNMT as it translates a Chinese sentence to English. First, the network encodes the Chinese words as a list of vectors, where each vector represents the meaning of all words read so far (“Encoder”). Once the entire sentence is read, the decoder begins, generating the English sentence one word at a time (“Decoder”). To generate the translated word at each step, the decoder pays attention to a weighted distribution over the encoded Chinese vectors most relevant to generate the English word (“Attention”; the blue link transparency represents how much the decoder pays attention to an encoded word).’

Wired appeared to be a bit excited that Google translate had invented its own type of artificial language but in fact Google researchers say that the networks “learn a form of interlingua representation for the multilingual model between all involved language pairs” they understood this from visualising “results” not finding new invented words.

Google researchers had proposed a way of using machine translation to ‘learn a linear projection between vector spaces that represent a language. The method consists of two simple steps. First, we build monolingual models of languages using large amounts of text. Next, we use a small bilingual dictionary to learn a linear projection between the languages’. When training a network words that are similar end clustering near to each other. Vector representations of words are known as word embeddings.

This reveals the structure of the language and it is the similarity between the structure of the languages that the translation process is based on. Having obtained the vector in the target language space, they deliver the most similar word vector in the output language space as the translation’. For example, vector(France) — vector(Paris) is similar to vector(Italy) — vector(Rome).

Vector space models (VSMs) represent (embed) words in a continuous vector space where semantically similar words are mapped to nearby points (‘are embedded nearby each other’). These stem from the theory that words that appear in the same contexts share semantic meaning. This is the way that programs like Word2Vec use a two-layer neural net to reconstruct the linguistic contexts of words within a multidimensional vector space; giving a Vector Representations of Words. The vectors also capture relationships between concepts via linear operations.

A very simple example of how this works would be to look at the structure of numbers in two languages. It is not surprising that word vectors for English numbers one to five and the corresponding Spanish words uno to cinco have similar geometric arrangements. It follows that if the translation of one and four from English to Spanish is known the translation of other numbers can be inferred. Instead of having tables of all of the possible numbers in multiple languages the model can look at the structure as a vector space is say English and Spanish and from this find the translation between Spanish and German.

Colorless green ideas sleep furiously

The question of ‘does the structure of predict vectors mimic meaningful semantic relations’ now appears to be more or less answered.

References

A Neural Network for Machine Translation, at Production Scale — 27 Sep 2016

Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation — 8 Oct 2016

Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation — 14 Nov 2016

MACHINE LEARNING: How Black is This Beautiful Black Box — 1 Dec 2016

Exploiting Similarities among Languages for Machine Translation — 17 Sep 2013

Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors