The Video Game Engine Inside Your Head: A different path towards Machine Intelligence

Object Detectors Emerge in Deep Scene CNNs

Overheard at CVPR: ArXiv Publishing Frenzy & Baidu Fiasco

CVPR attendance plot from Changbo Hu

CVPR 2015 started off with some excellent software tutorials on day one. There is some great non-alpha deep learning software out there and it has been making everybody's life easier. At CVPR, we had both a Torch tutorial and a Caffe tutorial. I attended the DIY Deep Learning Caffe tutorial and it was a full house -- standing room only for slackers like me who join the party only 5 minutes before it starts. Caffe is much more popular that Torch, but when talking to some power users of Deep Learning (like +Andrej Karpathy and other DeepMind scientists), a certain group of experts seems to be migrating from Caffe to Torch.Caffe is developed at Berkeley, has a vibrant community, Python bindings, and seems to be quite popular among University students. Prof. Trevor Darrell at Berkeley is even looking for a Postdoc to help the Caffe effort. If I was a couple of years younger and a fresh PhD, I would definitely apply.Instead of following the Python trend, Torch is Lua-based. There is no need for an interpreter like Matlab or Python -- Lua gives you the magic console. Torch is heavily used by Facebook AI Research Labs and Google's DeepMind Lab in London. For those afraid of new languages like Lua, don't worry -- Lua is going to feel "easy" if you've dabbled in Python, Javascript, or Matlab. And if you don't like editing protocol buffer files by hand, definitely check out Torch.When you share creations made in OpenCV, you end up sharing source files, but with the Deep Learning toolkits, you end up sharing your pre-trained networks. No longer do you have to think about a combination of 20 "little" algorithms for your computer vision pipeline -- you just think about which popular network architecture you want, and then the dataset. If you have the GPUs and ample data, you can do full end-to-end training. And if your dataset is small/medium, you can fine-tune the last few layers. You can even train a linear classifier on top of the final layer, if you're afraid of getting your hands dirty -- just doing that will beat the SIFTs, the HOGs, the GISTs, and all that was celebrated in the past two decades of computer vision.The way in which ConvNets are being used at CVPR 2015 makes me feel like we're close to something big. But before we strike gold, ConvNets still feel like a Calculus of Shadows, merely "hoping" to get at something bigger, something deeper, and something more meaningful. I think the flurry of research which investigates visualization algorithms for ConvNets suggests that even the network architects aren't completely sure what is happening behind the scenes. Josh Tenenbaum gave an invited talk titled The Video Game Engine Inside Your Head at the Scene Understanding Workshop on the last day of the CVPR 2015 conference. You can read a summary of his ideas in a short Scientific American article . While his talk might appear to be unconventional by CVPR standards, it is classic Tenenbaum. In his world, there is no benchmark to beat, no curves to fit to shadows, and if you allow my LeCun-Descartes analogy, then in some sense Prof. Tenenbaum might be the modern day Aristotle. As Prof. Jianxiong Xiao introduced Josh with a grand intro, he was probably right -- this is one of the most intelligent speakers you can find. He speaks 100 words a second, you can't help but feel your brain enlarge as you listen.One of Josh's main research themes is going beyond the shadows of image-based recognition. Josh's work is all about building mental models of the world, and his work can really be thought of as analysis-by-synthesis. Inside his models is something like a video game engine, and he showed lots of compelling examples of inferences that are easy for people, but nearly impossible for the data-driven ConvNets of today. It's not surprising that his student is working at Google's DeepMind this summer.A couple of years ago, Probabilistic Graphical Models (the marriage of Graph Theory and Probabilistic Methods) used to be all the rage. Josh gave us a taste of, and while we're not yet seeing these new methods dominate the world of computer vision research, keep your eyes open. He mentioned a recent Nature paper (citation below) from another well respected machine intelligence research, which should keep the trendsetters excited for quite some time. Just take a look at the bad-ass looking Julia code below:Probabilistic machine learning and artificial intelligence. Zoubin Ghahramani . Nature 521, 452–459 (28 May 2015) doi:10.1038/nature14541To see some of Prof. Tenenbaum's ideas in action, take a look at the following CVPR 2015 paper, titled Picture: A Probabilistic Programming Language for Scene Perception . Congrats to Tejas D. Kulkarni , the first author, an MIT student, who got the Best Paper Honorable Mention prize for this exciting new work. Google DeepMind, you're going to have one fun summer.There were lots of great presentation as the Scene Understanding Workshop, and another talk that truly stood out was about a new large-scale dataset (MIT Places) and a thorough investigation of what happens when you train with scenes vs. objects. Antonio Torralba from MIT gave the talk about the Places Database and an in-depth analysis of what is learned when you train on object-centric databases like ImageNet vs. Scene-scentric databases like MIT Places. You can check out " Object Detectors Emerge " slides or their ArXiv paper to learn more. Great work by an upcoming researcher, Bolei Zhou In the long run, the recent trend ofto ArXiv.org is great for academic and industry research alike. When you have a large collection of experts exploring ideas at very fast rates, waiting 6 months until the next conference deadline just doesn't make sense. The only downside is that it makes new CVPR papers feel old. It seems like everybody has already perused the good stuff the day it went up on ArXiv. But you get your "idea claim" without worrying that a naughty reviewer will be influenced by your submission.We now know who's doing what, significantly before publication time. Students, publish-or-perish just got a new name. Whether the ArXiv frenzy is a good or a bad thing, is up to you, and probably more a function of your seniority than anything else. But the CV buzz is definitely getting louder and will continue to do so.The Baidu cheating scandal might appear to be big news for outsiders just reading the Artificial Intelligence headlines, but overfitting to the testing set is nothing new in Computer Vision. Papers get retracted, grad students often evaluate their algorithms on test sets too many times, and the truth is that nobody's perfect. When it's important to be #1, don't be surprised that your competition is being naughty. But it's important to realize the difference between ground-breaking research and petty percentage chasing. We all make mistakes, and under heavy pressure, we're all likely to show our weaknesses. So let's laugh about it.The truth is that a lot of the top performing methods are more similar than different.CVPR has been constantly growing in attendance. We now have Phd Students, startups, Professors, recruiters, big companies, and even undergraduates coming to the show. Will CVPR become the new SIGGRAPH?ConvNets are here to stay, but if we want ConvNets to be more than than a mere calculus of shadows, there's still ample work do be done. Geoff Hinton's capsules keep popping up during midnight discussions. "" -- Geoff Hinton during his Reddit AMA . A lot of people (like Prof. Abhinav Gupta from CMU) are also talking about unsupervised CNN training, and my prediction is that learning large ConvNets from videos without annotations is going to be big at next year's CVPR.Most importantly, when the titans of Deep Learning get to mention what's wrong with their favorite methods, I only expect the best research to follow. Happy computing and remember, never stop learning.