ML Resources

Outline:

Where should I start (in ML)?

If you’re here looking for a general introduction to machine learning, I would proceed in the following order:

Introduction to Statistical Learning by James, Witten, Hastie, and Tibshirani. This textbook is a fantastic introduction to the field, written by some of its leading experts. It is short and well-written enough to be read cover-to-cover, high-level enough to be accessible to people from various backgrounds, yet rigorous in the sense that it teaches you to think about the problems rather than just giving you a cookbook. The textbook is free as a PDF at the book website, and the authors have also provided a collection of excellent videos that accompany the text on Youtube (the videos are nicely organized into a collection here). Note that this textbook also has a “big sister”, the classic Elements of Statistical Learning, which covers the same content at much more mathematical depth. However, I would start with ISL and then move to ESL from there as your interest directs. Note that the code in this book and class is in R and covers most of the classical ML toolkit but doesn’t cover deep learning.

Fast.ai by Jeremy Howard and Rachel Thomas. This course provides an accessible but extremely effective introduction to deep learning, the most popular branch of modern machine learning. The course is hands-on and immensely practical, but each lesson will equip you with the tools to build a very effective model for some new branch of ML (computer vision, NLP, etc.). The course is taught in Python using Pytorch and their own library.

Once you make your way through ISL and fast.ai, you will have a solid handle on all the most commonly used techniques in ML (classic and cutting edge). You will have a decent intuition for which methods can work when, and an ability to at least understand and modify code for ML analysis in both R and Python. From there, you should be prepared to jump at greater depth into any subarea of the field that you fancy.

Depending on background and bandwidth, a motivated student could probably work through the above material in 1-4 months. Go get ‘em! :)

Computer Science

Theory

File Description CS Theory Cheatsheet CS theory cheat sheet, originally accessed here Tim Roughgarden’s Lectures on Algorithms and Algorithms Illuminated Tim Roughgarden is one of most natural teachers I’ve ever seen. The first link is to lecture notes in PDF form from many classes. Videos for his Algorithms 2 class (CS 261) are here. The second is a link to his page for his new textbook, but that page also has links out to all the youtube videos from his coursera version of CS 161 (Algorithms 1).

Programming cheatsheets

Real Analysis

File Description Measure, Integration, and Real Analysis Sheldon Axler’s textbook-under-development on measure theory and real analysis. (Website).

Linear Algebra

Probability

Statistics

Causal Inference

Optimization

Information Theory

Classic Machine Learning

Textbooks, Lectures, and Course Notes

Special Topics and Blog Posts

Bayesian Machine Learning

Deep Learning

Textbooks, Lectures, and Course Notes

Special Topics and Blog Posts

Instructive Codebases

File Description Sebastian Raschka’s Deep Learning Models Github An impressively comprehensive set of TensorFlow and Pytorch models, annotated and perusable in 80+ Jupyter Notebooks. Pytorch Tutorials The tutorials put out by the pytorch developers are really fantastic. Easy to see why the community is growing so fast. Wiseodd’s Website and Deep Generative Models Github and An amazing collection of deep learning implementations.

Natural Language Processing

Textbooks, Lectures, and Course Notes

Special Topics and Blog Posts

Reinforcement Learning

Textbooks, Lectures, and Course Notes

File Description Sutton and Barto Open RL Book De-facto standard intro to RL, even though the textbook is only now about to be published! Stanford Reinforcement Learning Course by Emma Brunskill A really great RL class from Stanford. The website has a really nice note set. Also, lecture videos are on Youtube. Berkeley Deep Reinforcement Learning RL class from Berkeley taught by top dogs in the field, lectures posted to Youtube.

Special Topics and Blog Posts

Applications in Biology and Medicine

File Description Medical ML Datasets github Github repo of a bunch of medical ML datasets, compiled by Andrew Beam. ML for protein design github Nice github repo put together by Kevin Yang, covering a bunch of ground in the ML for proteins space. Best Practices in Single-Cell RNA-Seq Tutorial Excelllent tutorial on single-cell RNA-seq, walking through current best practices at every stage of scRNA-seq analysis.

Miscellaneous websites