To remove the texture bias, they train a CNN on a stylized image net (with random texture). The texture bias is mostly removed and the overall results are improved.

Natural Language Processing

One important task for NLP is semantic parsing: going from text to a semantic language (SQL, logic forms, …). Invited speaker, Mirella Lapata, talked about that in the first talk of the fourth day in Learning Natural Language Interfaces with Neural Models. She introduced a model that can transform natural language to any semantic language. For this purpose, several ideas were combined: a seq2tree architecture to handle the tree structure of semantic languages, a 2 stage architecture to first transform into semantic sketches and then in the target language and training on paraphrases of questions to improve coverage. The results are promising and reach the state-of-the-art for many tasks. The end to end trainable approach is particularly interesting because it avoids the problems of error accumulating in several modules as it is usually the case in NLP.

The second talk of the day was about applying the successful CNN architecture to NLP: it usually fails to reach state of the art, because it cannot handle the structure of language like RNN and attention-based approaches can. Pay Less Attention with Lightweight and Dynamic Convolutions introduces a variant of CNN that can reach state of the art: Dynamic convolution. By testing it on 4 different datasets they prove that this idea can be applied to various problems. They also prove that decreasing the size of the context for self-attention models doesn’t decrease the performance much and decreases computation time, which is the intuition for dynamic convolution: less attention can be enough.

From Pay Less Attention with Lightweight and Dynamic Convolutions

Besides improvement in classical NLP tasks and new architecture, some new tasks gain more popularity, and among those are tasks that need interaction with computer vision. The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision proposes a model that automatically learns the concept from image and texts. It introduces an architecture (NSCL) combining semantic parsing, visual parsing, and symbolic reasoning. What it finds is using the resulting concept makes it easier to parse new sentences: the state-of-the-art can be reached with less training data. Visual question answering (i.e. answering questions on images) is one of the tasks this model is evaluated on.

Graphs

Graph Neural Networks(GNN) are used mainly for solving tasks as node classification or graph classification. In the theoretical paper How Powerful are Graph Neural Networks?, the authors identify graph structures that cannot be distinguished by popular GNN variants, such as GCN and GraphSAGE. They show that most powerful GNN are as powerful as the Weisfeiler-Lehman graph isomorphism test (Weisfeiler and Lehman, 1968) and propose an architecture that reaches the upper bound. The idea is that the upper bound is achieved when the used aggregation function is injective(an example of an injective function is SUM — as opposed to MEAN or MAX).

On another line of work, there was an interesting paper on Learning the Structure of Large Graphs. Using approximate nearest neighbor they obtain a cost of O(n log(n)) for n the number of samples and in the experiments they make it scale up to 1 million nodes (with a Matlab implementation).

Adversarial Learning

Ian Goodfellow talked on Adversarial Machine Learning. The quality of images generated by GAN has improved very fast from 2014, were obtained images were low-resolution and needed further super scaling, to 2019 were images are very high resolution. In recent years, new techniques have been introduced, such as style transfer that makes possible to generate images that would be impossible in a supervised context.

Transfer zebra style on horse video

Adversarial examples help to improve machine learning models by removing a type of bias. One of the uses of GANs is to be able to train a reinforcement learning algorithm in a simulated environment using data that looks like the real world.

Self-play is also part of adversarial learning: it makes it possible for an algorithm such as AlphaGo to learn Go from scratch by playing against itself for 40 days.

Learning Representations

This was a hot topic this year. Invited speaker Léon Bottou talked about learning representations using causal invariance and new ideas he and his team have been working on. Causality is an important challenge in machine learning were algorithms are good at finding correlation but struggle to find causation. The problem is that an algorithm may learn spurious correlations that we do not expect to hold in future use cases. An idea to get rid of spurious correlations is to use multiple context-specific datasets instead of one big consolidated dataset.

Léon Bottou during his talk at ICLR 2019

A key contribution to improve the representations is Deep InfoMax, whose principle was used in a number of other papers of the conference. The idea is to learn good representations by minimizing mutual information between input and output of a deep neural network encoder. To do so they train a discriminator network between positive samples from the joint distribution (i.e. (feature, representation) pairs), and negative samples from the product-of-marginals distribution(i.e (feature, representation not corresponding to that feature) pairs).

From Deep InfoMax paper

Another interesting paper was Smoothing the Geometry of Probabilistic Box Embeddings, that reaches state-of-the-art for learning geometrically-inspired embeddings that can capture hierarchy and partial ordering.

From Smoothing the Geometry of Probabilistic Box Embeddings

The intuition behind the paper is that the “hard edges” of the standard box embeddings lead to unwanted gradient sparsity. The idea is to use a smoothed indicator function of the box as a relaxation of the “hard edges”.

Meta-learning

Also called learning to learn, the idea is to learn the learning process. A nice paper on this topic was Meta-Learning With Latent Embedding Optimization, where authors achieve strong experimental results on both the tiered ImageNet and mini ImageNet datasets compared to MAML (reference paper in the field). The main idea is to map the model parameters in a low dimensional space (that is learned) and then perform the meta-learning on that space.