Deep neural machine translation (NMT) can learn representations containing linguistic information. And despite the differences between various models, they all tend to learn similar properties. This phenomena got researchers wondering whether the learned information is fully distributed and embedded to individual neurons. Recent research results confirmed that hypothesis, revealing that simple properties such as coordinating conjunctions and determiners can be attributed to individual neurons, while more complex linguistic properties such as syntax and semantics are distributed across multiple neurons.

Following on this, researchers from The Chinese University of Hong Kong, Tencent AI Lab and University of Macau have proposed a new neuron interaction based representation composition for NMT. They believe simulating the interactions in neuron systems could be a more effective approach than representation composition, and demonstrate the power of such methods in deep natural language processing tasks.

The researchers employed bilinear pooling to simulate the strong neuron interactions by executing pairwise multiplicative interaction among individual representation elements. Because second-order and first-order representations encode different types of information, and bilinear pooling only encodes multiplicative second-order features, they extended the method to also incorporate first-order representations so as to capture more comprehensive information.

Illustration of (a) bilinear pooling that models fully neuron-wise multiplicative interaction, and (b) extended bilinear pooling that captures both second- and first-order neuron interactions

Bilinear pooling is an outer product of two vector representations, so all elements within the vectors have direct multiplicative interactions with each other. As such a structure can only encode second-order (multiplicative) interactions among individual neurons, the researchers appended 1 to the two R vectors so their outer product produced both second-order and first-order interactions among the vector elements.

Researchers choose Transformer models, which are multi-layer, multi-head, self-attention networks (MLMHSANs) to validate their extended bilinear pooling methods on neurons within the networks. MLMHSANs have established SOTA performance across different NLP tasks, and compose both multi-layer representations and multi-head representations, enabling the proposed neuron-interaction based representation composition to represent both.

The experiments were conducted on the WMT2014 English to German (En-De) and English to French (En-Fr) translation tasks and evaluated on Base and big Transformer models.

Performance on WMT14 English to German translation tasks

Comparing NMT systems on WMT14 English to German (EN-DE) and English to French (EN-FR) translation tasks

On English to German translation tasks the proposed neuron-interaction based representation composition achieved a +1.23 BLEU improvement over Transformer-Base performance. The proposed system also significantly improves translation performance for both Base and big Transformer models across language pairs.

The paper Neuron Interaction Based Representation Composition for Neural Machine Translation is on arXiv.