By Matt Gershoff | Monday, December 17th, 2012

Every now and then I get asked for some help or for some pointers on a machine learning/data science topic. I tend respond with links to resources by folks that I consider to be experts in the topic area. Over time my list has gotten a little larger so I decided to put it all together in a blog post. Since it is based mostly on the questions I have received, it is by no means complete, or even close to a complete list, but hopefully it will be of some use. Perhaps I will keep it updated, or even better yet, feel free to comment with anything you think might be of help.

Also, when I think of data science, I tend to focus on Machine Learning rather than the hardware or coding aspects. If you are looking for stuff on Hadoop, or R, or Python, sorry, there really isn’t anything here.

Neo Makes Cheese

Before you do anything else, start boning up on your linear (matrix) algebra. This is the single most important thing you can do to get yourself bootstrapping your ML education.

This is the deal, and I don’t care what anyone else has told you, if you want to have any hope in understanding what is going on in Machine Learning, Data Science, Stats, etc. you have got to get a handle on Linear Algebra.

Painful! Trust me, I know. I got an ‘F’ the first time I took it in college. I had no idea what was going on. The only thing I really remember was the professor shouting at us after poor marks on homework and tests, ‘How can you make cheese if you don’t know where milk comes from!? Its plain, common ordinary horse sense!’

Kinda nuts, and it really didn’t make total sense, but his point was, you have got to have the basics down before you can actually make anything useful.

The flip side is that once you start getting a feel for linear algebra, you can much more easily hop around from various advanced topics. This is because much, but not all, of ML topics rest on applications of linear algebra.

Where to start? For me, there really is only one place that I go to get a refresh on a topic that I realize I don’t really understand, and that is Gilbert Strang’s undergrad class at MIT. Just an awesome intro course and it makes me covet the students who get to go to MIT. See his class here Linear Algebra Class

General Machine Learning

Now this resource is a big one – and I think just this link makes this post worth it. As far as I am concerned, www.videolectures.net is one of the most valuable sites on the internet. Sure, maybe some of the other ‘disruptive’ educational sites are useful, but almost everything Machine Learning is in here and then some – Mother lode, Paydirt, or whatever you want to call it, you just hit it with this. http://videolectures.net/Top/ Computer_Science/Machine_ Learning/

But don’t horde it, pass this resource along.

Also, a good first lecture on ML is Iain Murray’s tutorial from a machine learning summer school– Here is the lecture Murray Teaches ML

On to some TOPICS

LDA for Topic Models or How I Learned to Pronounce Dirichlet

One, this is useful, Two, data science folks are on about this so if you want to fit in – or hit up any ‘Big Data’ VCs, you better be able to name drop this, and be able to back it up if you get called out – not that the VC will be able to call you out ;).

David Blei is probably the best source to start looking for Topic modeling research applications

http://www.cs.princeton.edu/~ blei/

http://www.cs.princeton.edu/~ blei/topicmodeling.html

Matt Hoffman – Matt is over at ADOBE now, but he wrote some python code for online Topic Models (I think this is his research as well)- check it out http://www.cs.princeton.edu/~ mdhoffma/

From LDA to SVD to LSI

A non-Bayesian/probabilistic approach to topic modeling is Latent semantic indexing, where you use a version of SVD (actually I think it is really basically Principal components -which is a related eigendecomposition/ factorization). There is a wiki here on LSI to get you started

http://en.wikipedia.org/wiki/ Latent_semantic_indexing I know it is normally kinda lame to just link to Wiki, but it is a pretty good overview.

SVD or Recommending Everyone’s Favorite Factorization

If you didn’t feel the need to look over Strang’s course, take a look at this class on the SVD , which is the basis for most every recommendation system – I highly recommend getting at least the basic idea down.

Bayesian Or Frequentist Or Who Cares?



Huge topic that I am not going to really try to flesh out, but Michael Jordon (the one at Berkeley, not Chicago) has a nice lecture contrasting the two: Bayesian or Frequentist

Also check out MJ on the Dirichlet Process – not the best audio, but since he also sets up the Parametric/Non-Parametric and Bayesian/Frequentist classifications on slide two, it is worth bending your ear a bit. MJ DUNKS!

On to Non Parametric Bayesian approaches

There seems to be a bit of chatter in the startup/ Big data space around Non Parametric Bayesian methods. If you want to see more after checking out MJ’s talk, take a look at David MacKay’s tutorial on the Gaussian Process

Also check out Gaussian Process for Machine learning by Chris Williams and Carl Rasmussen here ‘Gaussian Process for Machine Learning’. What is great is that this book is online and free! What is not so great is that it is pretty hairy going, but take a peak at chapter 6, it has a comparison of GPs with Support Vector Machines (SVMs).

PMR Madness! Joints and Moralizing

I took Chris Williams’ class on Probabilistic Models and Reasoning several years back when I was studying AI at Edinburgh. Take a look at the slides etc. There is stuff on graphical models, junction tree, etc. Chris is one of those crazy smart guys who I use in my mind’s eye as a litmus test for when someone is trying to pass themselves off as an expert in the space. Sort of a where are they on the CW scale – most often it’s pretty low 😉

Also see Carl’s talk on Gaussians

The First Order of Business – Let’s Get Stochastic

Online vs Offline – everyone is all agog about bigdata hadoop etc., but if you are interested in more than hardware/IT, you will want to think about how you are going to use data and what types of systems approaches are most appropriate. If you are going to be doing ML on a lot of data you should be aware of SGD. Leon B is the man for this – here is a video and his home page

Deep Learning or Six Degrees of Geoff Hinton

Here are just a few of those that have been students or post docs at his lab; Yann LeCun (ANNs/Deep Learning), Chris Williams, (GPs), Carl Rasmussen (GPs), Peter Dayan (NeuroScince and TD-Learning), Sam Roweis , and recently Iain Murray (see lecture above).

After the NYTimes had a piece on deep learning there was a fair amount of online chatter about it. What is it? Let Yann LeCun tell you. Yann is a professor over at NYU, and has worked with Neural Nets for quite some time, using energy models and his convolutional net. Take a look at Yann’s presentation he gave to the Machine Learning summer school I attended back in ’08. http://www.cs.nyu.edu/~yann/talks/lecun-20080905-mlss-deep.pdf

** Update **

Yann suggested the following updated links: 1) A recent invited talk at ICML 2012 and 2) some slides from a more recent summer school IPAM. I have not had the time to take a look, but since Yann suggested these personally, check them out.

**

As an aside, one of the great things about those Pascal machine learning summer schools is that you get to hang with these folks informally. So chat SVMs and feature selection with Isabelle Guyon at dinner, lunch with Rich Sutton, and perhaps talking shop with Yann over a glass of Pineau. If you can make it happen, I highly recommend attending one of these, next one looks to be in Germany.

Also, feel free to peruse Geoff Hinton’s site for some goodness on Autoencoders and RBMs.

NLP, but not LDA

I couldn’t figure out how to conjugate the verb ‘to be’ until I was like 12, so, not surprisingly, I never really got into NLP.

I was, however, fortunate to take a class with Philip Koehn, while I was at Edinburgh, who has helped drive much of the recent work on machine translation – I’ll let him explain it to you here. You can also get his book if you are interested Statistical Machine Translation, but to be honest, I haven’t read it.

You may have noticed that he used the term informatics. Here is a description/definition of it on the Informatics web site at the U of Edinburgh (there is a PDF at the bottom of that page with more detail). I actually think informatics is a better term than Data Science, but hey, out with the old in with the new(ish).

Named Entity – The Subjects and Maximizing Entropy

If you are into NLP you might want to be able to figure out who the players are in text documents. You will need to do this

http://en.wikipedia.org/wiki/ Named-entity_recognition

Maybe this is good – I have never used it – http://nlp.stanford.edu/ software/CRF-NER.shtml

If you want to do your own NE model, MAX Ent models have been used – here is a resource with MAXENT for NLP. I admit, I don’t know what the state of art is, or if this is still used, so feel free to comment with some better/newer stuff.

Okay, this post is getting long, so I will wrap it up. I didn’t get to Reinforcement learning or Cluster methods (other than LDA), so perhaps I will extend this post, or write an follow up soon. Please feel free to add thoughts via the comments and if you haven’t yet, please sign up for your free Conductrics account.

Sign Up for a free account!