Close to a thousand machine learning papers are published each and every week. On Fridays, Synced selects seven studies from the last seven days that present topical, innovative or otherwise interesting or important research that we believe may be of special interest to our readers.

Highlights of the week:

Researchers led by Zhi-Hua Zhou introduced a deep forest approach to multi-label learning, which achieved best performance on nine benchmark datasets over six multi-label measures.

Jeff Dean summarized ML advances in the post-Moore’s Law-era and added his insights on future trends in DL

DeepMind proposed a new method combining planning algorithms and RL algorithms, which achieves SOTA performance in Atari 2600.

A Quoc V. Le led Google Brain team developed a new object detector exceeding SOTA models in all aspects.

A UK Huawei research team claimed the DeepMind α-Rank algorithm failed in reproducibility.

Paper one: Multi-label Learning with Deep Forest (arXiv)

Author: Liang Yang, Xi-Zhu Wu, Yuan Jiang, Zhi-Hua Zhou from National Key Laboratory for Novel Software Technology, Nanjing University

Abstract: In multi-label learning, each instance is associated with multiple labels and the crucial task is how to leverage label correlations in building models. Deep neural network methods usually jointly embed the feature and label information into a latent space to exploit label correlations. However, the success of these methods highly depends on the precise choice of model depth. Deep forest is a recent deep learning framework based on tree model ensembles, which does not rely on backpropagation. We consider the advantages of deep forest models are very appropriate for solving multi-label problems. Therefore we design the Multi-Label Deep Forest (MLDF) method with two mechanisms: measure-aware feature reuse and measure-aware layer growth. The measure-aware feature reuse mechanism reuses the good representation in the previous layer guided by confidence. The measure-aware layer growth mechanism ensures MLDF gradually increase the model complexity by performance measure. MLDF handles two challenging problems at the same time: one is restricting the model complexity to ease the overfitting issue; another is optimizing the performance measure on user’s demand since there are many different measures in the multi-label evaluation. Experiments show that our proposal not only beats the compared methods over six measures on benchmark datasets but also enjoys label correlation discovery and other desired properties in multi-label learning.

Multi-Label Deep Forest (MLDF) framework. Each layer ensembles two different forests (black above and blue below).

Paper Two: Unifying machine learning and quantum chemistry with a deep neural network for molecular wavefunctions (Nature)

Author: K. T. Schütt, M. Gastegger, A. Tkatchenko, K.-R. Müller, R. J. Maurer

Abstract: Machine learning advances chemistry and materials science by enabling large-scale exploration of chemical space based on quantum chemical calculations. While these models supply fast and accurate predictions of atomistic chemical properties, they do not explicitly capture the electronic degrees of freedom of a molecule, which limits their applicability for reactive chemistry and chemical analysis. Here we present a deep learning framework for the prediction of the quantum mechanical wavefunction in a local basis of atomic orbitals from which all other ground-state properties can be derived. This approach retains full access to the electronic structure via the wavefunction at force-field-like efficiency and captures quantum mechanics in an analytically differentiable representation. On several examples, we demonstrate that this opens promising avenues to perform inverse design of molecular structures for targeting electronic property optimisation and a clear path towards increased synergy of machine learning and quantum chemistry.

Synergy of quantum chemistry and machine learning. (a) Forward model (b) Hybrid model

Paper Three: The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design (arXiv)

Author: Jeff Dean from Google Research

Abstract: The past decade has seen a remarkable series of advances in machine learning, and in particular deep learning approaches based on artificial neural networks, to improve our abilities to build more accurate systems across a broad range of areas, including computer vision, speech recognition, language translation, and natural language understanding tasks. This paper is a companion paper to a keynote talk at the 2020 International Solid-State Circuits Conference (ISSCC) discussing some of the advances in machine learning, and their implications on the kinds of computational devices we need to build, especially in the post-Moore’s Law-era. It also discusses some of the ways that machine learning may also be able to help with some aspects of the circuit design process. Finally, it provides a sketch of at least one interesting direction towards much larger-scale multi-task models that are sparsely activated and employ much more dynamic, example- and task-based routing than the machine learning models of today.

Paper Four: CenterMask : Real-Time Anchor-Free Instance Segmentation (arXiv)

Author: Youngwan Lee, Jongyoul Park from Electronics and Telecommunications Research Institute (ETRI)

Abstract: We propose a simple yet efficient anchor-free instance segmentation, called CenterMask, that adds a novel spatial attention-guided mask (SAG-Mask) branch to anchor-free one stage object detector (FCOS) in the same vein with Mask R-CNN. Plugged into the FCOS object detector, the SAG-Mask branch predicts a segmentation mask on each box with the spatial attention map that helps to focus on informative pixels and suppress noise. We also present an improved VoVNetV2 with two effective strategies: adds (1) residual connection for alleviating the saturation problem of larger VoVNet and (2) effective Squeeze-Excitation (eSE) deals with the information loss problem of original SE. With SAG-Mask and VoVNetV2, we deign CenterMask and CenterMask-Lite that are targeted to large and small models, respectively. CenterMask outperforms all previous state-of-the-art models at a much faster speed. CenterMask-Lite also achieves 33.4\% mask AP / 38.0\% box AP, outperforming the state-of-the-art by 2.6 / 7.0 AP gain, respectively, at over 35fps on Titan Xp. We hope that CenterMask and VoVNetV2 can serve as a solid baseline of real-time instance segmentation and backbone network for various vision tasks, respectively. Code will be released.

Architecture of CenterMask.

Paper Five: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model (arXiv)

Author: Julian Schrittwieser, Ioannis Antonoglou,Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver

Abstract: Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess and Go, where a perfect simulator is available. However, in real-world problems the dynamics governing the environment are often complex and unknown. In this work we present the MuZero algorithm which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. MuZero learns a model that, when applied iteratively, predicts the quantities most directly relevant to planning: the reward, the action-selection policy, and the value function. When evaluated on 57 different Atari games — the canonical video game environment for testing AI techniques, in which model-based planning approaches have historically struggled — our new algorithm achieved a new state of the art. When evaluated on Go, chess and shogi, without any knowledge of the game rules, MuZero matched the superhuman performance of the AlphaZero algorithm that was supplied with the game rules.

(A) How MuZero uses its model to plan; (B) How MuZero acts in the environment; © How MuZero trains its model.

Paper Six: EfficientDet: Scalable and Efficient Object Detection (arXiv)

Author: Mingxing Tan, Ruoming Pang, Quoc V. Le from Google Research, Brain Team

Abstract: Model efficiency has become increasingly important in computer vision. In this paper, we systematically study various neural network architecture design choices for object detection and propose several key optimizations to improve efficiency. First, we propose a weighted bi-directional feature pyramid network (BiFPN), which allows easy and fast multi-scale feature fusion; Second, we propose a compound scaling method that uniformly scales the resolution, depth, and width for all backbone, feature network, and box/class prediction networks at the same time. Based on these optimizations, we have developed a new family of object detectors, called EfficientDet, which consistently achieve an order-of-magnitude better efficiency than prior art across a wide spectrum of resource constraints. In particular, without bells and whistles, our EfficientDet-D7 achieves stateof-the-art 51.0 mAP on COCO dataset with 52M parameters and 326B FLOPS1 , being 4x smaller and using 9.3x fewer FLOPS yet still more accurate (+0.3% mAP) than the best previous detector.

EfficientDet architecture

Paper Seven: α^α-Rank: Practically Scaling α-Rank through Stochastic Optimisation (arXiv)

Author: Yaodong Yang, Rasul Tutunov, Phu Sakulwongtana, Haitham Bou Ammar from Huawei Technologies Research & Development U.K.

Abstract: Recently, α-Rank, a graph-based algorithm, has been proposed as a solution to ranking joint policy profiles in large scale multi-agent systems. α-Rank claimed tractability through a polynomial time implementation with respect to the total number of pure strategy profiles. Here, we note that inputs to the algorithm were not clearly specified in the original presentation; as such, we deem complexity claims as not grounded, and conjecture solving α-Rank is NP-hard.

The authors of α-Rank suggested that the input to α-Rank can be an exponentially-sized payoff matrix; a claim promised to be clarified in subsequent manuscripts. Even though α-Rank exhibits a polynomial-time solution with respect to such an input, we further reflect additional critical problems. We demonstrate that due to the need of constructing an exponentially large Markov chain, α-Rank is infeasible beyond a small finite number of agents. We ground these claims by adopting amount of dollars spent as a non-refutable evaluation metric. Realising such scalability issue, we present a stochastic implementation of α-Rank with a double oracle mechanism allowing for reductions in joint strategy spaces. Our method, αα-Rank, does not need to save exponentially-large transition matrix, and can terminate early under required precision. Although theoretically our method exhibits similar worst-case complexity guarantees compared to α-Rank, it allows us, for the first time, to practically conduct large-scale multi-agent evaluations. On 104×104 random matrices, we achieve 1000x speed reduction. Furthermore, we also show successful results on large joint strategy profiles with a maximum size in the order of (225) (33 million joint strategies) — a setting not evaluable using α-Rank with reasonable computational budget.