Hi all, This is edition #️⃣3️⃣0️⃣ of this newsletter! This one discusses some super interesting topic

August 20 · Issue #30 · View online

Hi all, This is edition #️⃣3️⃣0️⃣ of this newsletter! This one discusses some super interesting topics: How can we learn meaning in NLP?

in NLP? Do we need recurrence in neural networks?

in neural networks? How to build a research lab (and do research) according to Yoshua Bengio?

(and do research) according to Yoshua Bengio? Is ML a pseudo science ?

? How can we make ML more accessible? As always, there are also some cool videos, implementations, blog posts, and research papers. Enjoy with the beverage of your choice! ☕️🍶🍸🍹 I really appreciate your feedback, so let me know what you love ❤️ and hate 💔 about this edition. Simply hit reply on the issue. If you were referred by a friend, click here to subscribe. If you enjoyed this issue, give it a tweet 🐦.

Learning Meaning in NLP 🗣🤔

Last week, there was a nice discussion on Twitter about learning meaning in NLP. More specifically, the discussion focused on whether it is possible for a model that is only trained on raw text such as a language model to learn the meaning of a sentence. Similar to Matt Gardner , I’d argue that language modeling gives a non-zero signal for learning meaning, but needs to be augmented with more explicit inductive biases to capture a more comprehensive notion of meaning. Thomas Wolf wrote up an excellent overview of the discussion If you want an overview of current state-of-the-art language models, check out this AI Journal video about ULMFiT . The folks at feedly have also done a great job of showing how easy it is to apply these methods in this post , where they match the performance of fastText trained on 4 million Amazon reviews using ULMFiT with only 1,000 labeled samples. While learning the meaning of concepts benefits from grounding, logical words such as “if”, “and”, or even “no” don’t have a referent; there’s nothing you can point to in the world that is an “if” or an “and”. This blog post gives an overview of the different schools of thought on how children learn such logical words, from “logical nativism” (innate logical concepts that enable the acquisition of such words) to probabilistic induction; it also sketches a new acquisition theory, social bootstrapping and argues that children map these logical words to speech acts with specifically social functions. Understanding better the psycholinguistic aspects of how children acquire language may help us ultimately design better computational models.

Do we need recurrence in neural networks? 🔃

There’ve been some recent discussions, mostly evoked by this paper (and this accompanying blog post ), if we actually need RNNs or if we can just replace every RNN with a feed-forward neural network. While the observation that RNNs in certain settings can be replaced with feed-forward NNs (the Transformer , for instance, is just a feed-forward model) is nothing new, the paper provides some nice theoretical results for a “stable” RNN (roughly, one without exploding gradients) for which such a replacement is possible. Yoav Goldberg shares his thoughts on the results here our current RNNs still suck at modeling long-term dependencies. This is emphasized in the paper by highlighting the sensitivity to vanishing gradients; Chris Dyer made this point in his However, the only reason why it is possible to replace RNNs with feed-forward neural networks is that. This is emphasized in the paper by highlighting the sensitivity to vanishing gradients; Chris Dyer made this point in his workshop talk at ACL 2018 emphasizing that RNNs are biased towards sequential recency; in this ICLR 2017 paper , the authors show that a LM that only uses the last 5 words is on par with state-of-the-art models; among others. A counterpoint to this are results from OpenAI Five , which show that large LSTM models can perform successfully in problems with large time horizons. The main takeaway, perhaps, is that in order to create models that can better model long-term dependencies, we need to evaluate our models on tasks that explicitly measure this capability, compared to tasks like language modeling or sentiment analysis, which only require this implicitly. A good example of such a more explicit task is modeling subject-verb agreement

How to build a research lab 🏛

Yoshua Bengio is probably known to anyone interested in ML. I found the replies Yoshua gave in this Q&A with CIFAR’s Graham Taylor extremely lucid, insightful, and full of self-reflection. I think the answers are useful not only for junior faculty starting to build their own lab, but for anyone starting with research or considering of going into research. Highlights of the conversation for me include: One thing I would’ve done differently is not disperse myself in different directions, going for the idea of the day and forgetting about longer term challenges. First, it’s not just doing the research, it’s making it known. Going to workshops and conferences, visiting other labs. You don’t have to wait to be invited. Listen to your gut. Many people lack the self-confidence necessary for that and they miss opportunities.

Is Machine Learning a pseudo science? 🔮

Making ML more accessible 🌍

Videos

Resources and overviews

solving a murder.” What better way to get started with learning ML? Model-based ML book 📖 An early access version of John Winn and Christopher Bishop’s new ML book. The first chapter “introduces all the essential concepts of model-based machine learning, in the course of.” What better way to get started with learning ML? Model scheduling ⏰ Naomi Saphra gives a comprehensive overview of different approaches to dynamically modify a model’s configuration during training. Naomi Saphra gives a comprehensive overview of different approaches to dynamically modify a model’s configuration during training.

Implementations

Cool posts and articles

Everything is Dijkstra 🛣 Eric Jang shows that currency arbitrage (in finance), Q-learning (in RL), and path tracing (in computer graphics) can all be reduced to the classic Djikstra’s shortest path algorithm. Eric Jang shows that currency arbitrage (in finance), Q-learning (in RL), and path tracing (in computer graphics) can all be reduced to the classic Djikstra’s shortest path algorithm. Deep Learning in NLP 🤖 Vered Shwartz discusses what Deep Learning has improved and remaining challenges for Deep Learning in NLP. Vered Shwartz discusses what Deep Learning has improved and remaining challenges for Deep Learning in NLP. The man behind Google’s AutoML 👨‍🔬 This article portrays Quoc Le, co-founder of Google Brain and co-creator of breakthroughs such as sequence-to-sequence learning and neural architecture search. Text-based adventures 🕹 This article discusses Microsoft’s TextWorld, “the OpenAI Gym of Language Learning Agents”. AI and improv🕴 An article by the New York Times on how Piotr Mirowski and Kory Mathewson use a chatbot for improvisation. An article by the New York Times on how Piotr Mirowski and Kory Mathewson use a chatbot for improvisation.

Paper picks

Did you enjoy this issue?

If you don't want these updates anymore, please unsubscribe here If you were forwarded this newsletter and you like it, you can subscribe here