This year I had the opportunity to attend the International Conference for Machine Learning (ICML) in New York City, and overall, I quite enjoyed the experience. The conference lasted six days including tutorials and workshops and had talks and posters that spanned the full range of machine learning. It was a good conference to go to in order to get a high level overview of a variety of algorithms, applications and datasets. It also was a way to meet others in the field — and there was certainly no lack of people to meet given the fact that over 3,200 people attended the conference. Presented here are my opinions on what I was able to take away from my time at ICML.

Most Interesting Keynote:

There were four keynote talks during the regular conference given by Susan Athey of the Stanford Graduate School of Business, Christos Faloutsos of Carnegie Mellon University, and Daniel Spielman of Yale University, but my favorite by far was given by Fei-Fei Li on the subject of “A Quest for Visual Intelligence in Computers”. Not only was her talk engaging, it was also informative and inspiring. She is the primary driver of ImageNet, an image database which has been helping researchers and companies improve image object detection since 2010. The corresponding ImageNet Large Scale Visual Recognition Competition was a key driver to making deep networks famous.

She is more recently behind the Visual Genome project which extends ImageNet to allow for object relationships. One of the key ideas Fei-Fei emphasized is that while humans are able to interpret a lot of information from a single image (as the saying goes “a picture is worth a 1,000 words”), algorithms that we have today are only able to explain an image with only a few words. A major reason for this discrepancy is that we haven’t had a dataset which is tagged in this manner — hence the Visual Genome project. The talk can be summarized in her three key messages:

Learning is the path to visual interpretation

Learning requires Big Data

Deep understanding requires knowledge of structure

Most interesting papers:

A large number of papers were on the topics of neural networks and deep learning — both of which I am quite interested in. Other papers focused on optimization, as well as online/supervised learning. Most of the papers were presented by academics or research labs, with a smaller percentage of papers presented by companies. I found the real world scenarios to be quite interesting as that is the realm where I dwell. I do appreciate the innovations presented by academics, but getting an algorithm to work with real world data is often quite different than tuning that same algorithm on carefully curated datasets. Of particular interest, researchers from UCLA was able to forecast if a patient needed greater or less care in an ICU. Applications like this are interesting as they were working in a very real world setting with very high stakes.

Some of my other favorite papers from the conference included:

“Ask Me Anything: Dynamic Memory Networks for Natural Language Processing” by Ankit Kumar et al from MetaMind. The premise behind this paper is that all NLP questions can be considered as question and answer session so one can do sentiment analysis, part of speech tags, etc with a memory network. One of the interesting nuggets for this paper arose during the poster session when Richard Socher stated that if you are always asking the same question, you may remove the question module from the system. We are trying to use memory networks in our project on document novelty, so this paper was quite relevant. We haven’t quite figured out how to remove the question module in our implementation, but will continue to tinker with it.

“Pixel Recurrent Neural Networks” by Aaron Van Den Oord et al from Google Deepmind. This was one of the three papers which was awarded best paper. It uses a deep neural network to predict the pixels in an image along both spatial dimensions. The method utilizes skip-layers which have generally been shown to allow networks to get deeper without losing accuracy.

“Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin” by Dario Amodei and a plethora of people from Baidu. Most research focuses on the English language which is cool (I like English), but there are a lot of other languages to consider as well. This paper was interesting in that it is able to recognize and learn in either English or Mandarin Chinese without having to have an extensive English or Chinese training set. The task is made more complicated because they are utilizing speech data as opposed to written text.

“Texture Networks: Feed-forward Synthesis of Textures and Stylized Images” by Dmitry Ulyanov was by far the most visually interesting presentation. They came up with a deep network which adds textures on your image to make them more artistic. You can download and try out their code — I always appreciate it when people release the code utilized in a paper! As a side note this paper builds upon the one reviewed by my colleague entitled Image Style Transfer using Convolutional Neural Networks.

Tutorials and Workshops:

There were 9 tutorials and 23 workshops offered at the conference. In general, I enjoyed the sessions that I attended. The most interesting tutorial was Deep Reinforcement Learning presented by Google DeepMind’s David Silver who presented using reinforcement learning with neural networks to solve such tasks as 3D navigation, and more famously the game of Go. However, the more directly applicable workshop was that on Memory Networks presented by Facebook’s Jason Weston. Most of the information presented at this workshop where included in the papers Memory Networks and End-To-End Memory Networks which I previously read, but it was helpful to hear him explain this type of system in depth.

It was more difficult to choose a workshop to attend as there were so many to choose from. I managed to catch sections of Deep Learning with Small Data, Anomaly Detection, and Data Efficient Machine Learning. There were a number of papers presented in these workshops that I hadn’t had previous exposure to. Others I spoke to also commented that you typically see newer material at the workshops than the conference. Some of these papers seem highly relevant such as “A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues”, while others are simply humorous such as “How Many Zombies Do You Know?”

Figure from paper by Andrew Gelman http://arxiv.org/pdf/1003.6087v1.pdf

Because our lab works in the open source arena, the Deep Learning with Small Data workshop was particularly promising, as we often aren’t able to get big datasets for the problems we are interested in. While not all of the presentations really focused on this subject, the methods presented during the workshop were tried and true. Joelle Pineau from McGill University summarized the potential methods with the following four tips:

Get more data (tip was met with laughs and applause). However before you try to gather more data it is quite important to make sure you can verify your algorithm is correct with the data you are gathering, and that it answers the question you are trying to pose. Get or use different data. Use ImageNet or Word2Vec if you can as these are quite large datasets. A number of the talks during the conference spoke about transfer learning. Transform your data. By transforming your data, you may end up having a more robust model. Use diverse feedback while training your system. She suggested training many models to maximize the potential for one of them to be correct.

Things that could have gone better:

As with any large event, there were some issues with coordination and crowd management, but I feel that some of the issues that arose could have been avoided. A lot of people attended the event, and we all paid a significant amount of money to be there (non-students paid at least $480 for the main conference, and $840 for the conference + tutorial + workshops). While I know it must not have been cheap to rent out the space in Times Square NY, there are some aspects to a conference that I have come to expect which include:

Internet: on the first couple of days there was wireless internet only on a single floor of the hotel, and there did not appear to be internet access at the other venues

Venue: Most of the conference was at the Marriot in Times Square, but some workshops and tutorials were in other buildings. I had difficultly navigating to the proper room and building for the various sessions — even though there were a number of very helpful people around to guide. For this reason, I ended up not going to some of the workshops I was interested in as they were away from the primary venue (and I didn’t know if they had internet)

Food: The conference did not include any breakfasts or lunches, only snacks during the breaks and these snacks went quite fast (in particularly the bananas). I also expected there to be more food provided at the reception events. Luckily, there was certainly no shortage of great food in NY.

The fight for bananas. Image from https://www.minionslovebananas.com/

Other amenities: When I went to the registration desk I was also disappointed that I only received a badge — no program, and no swag. While I certainly don’t need these things, the program would have particularly been useful when there was no internet connectivity, and the swag is simply nice.

Overall I did enjoy my time at ICML and would certainly consider going to next year’s conference. Most of the slides for workshops and tutorials have been posted, and videos should be going up soon. Additionally, all of the papers are also posted so if you missed going to the conference in person, you can still check-out its content.