Newbie’s Guide to Study Reinforcement Learning

Taking baby steps in the realm of Reinforcement Learning

Starter resource pack described in this guide

If the metered paywall is bothering you, go to this link.

If you want to know my path for Deep Learning, check out my article on Newbie’s Guide to Deep Learning.

What I am going to talk here is not about Reinforcement Learning but about how to study Reinforcement Learning, what steps I took and what I found helpful during my learning process. If you find something useful, please let me know in comments. If you have other paths which you would want to recommend, leave those in comments for others to see (and I will edit, add, and update the text where appropriate). Peace folks!

Stop the Deluge of Information

Reinforcement Learning has quite a number of concepts for you to wrap your head around. Your head will spin faster after seeing the full taxonomy of RL techniques. Things start to get even more complicated once you start to read all the coolest and newest research, with their tricks and details to get things working. But watching those OpenAI bots playing DoTA is just so cool that you might want to learn all its techniques, tricks and build your very own bot. First, stop right there. Forget about how to implement your own version of OpenAI Five for now. You may end up getting back to square one; i.e. leaving RL for good, only to find yourself trying to learn it all over again three months later.

You would need to cut yourself from deluge of tutorials (my two cents on tutorials) and YouTube videos saying that you can code “something batshit awesome RL stuff in 5 minutes with 20 lines of code” or stuff like that. Because they all teach you nothing! Yeah, nothing (except git cloning and/or copying the code). You will know the real taste of knowledge once you banged you head hard enough to figure out how value iteration works for real and realize that the idea so simple, yet works quite well for a simple toy example. That’s how you learn something and that’s how you can go forward on this learning path.

The Online Course

So, let’s clear our minds, start with a fresh sheet of paper, keep yourself calm, and take Practical Reinforcement Learning course from Coursera. This course will not be a walk in the park but the challenge is just the right amount to exercise your brain and question yourself whether you have fully grasped the core concepts. It starts out with very basic Cross Entropy method, and gradually moves onto to Policy Iteration, Value Iteration, Q-Learning and SARSA. The second half of the course involves: Deep Q Networks, and Actor-Critic Algorithms. One good thing about this course is that you don’t need to worry about having a heavy computational resource since you can do the assignments in Jupyter notebooks on Coursera or Google Colab (they have the instructions for setting up on Colab) or even on your own machine with your favorite IDE. Personally, I prefer to code in my local IDE since I have all my debugging tools at my disposal. In fact, I would even nudge you in the direction of running and debugging your code in IDE since you would need to understand what the OpenAI gym objects actually contain (using print statements is not ideal). Otherwise, you will feel like things are in black box even though they are not. Check out OpenAI documentations to get a feel for a particular environment and start happily debugging (yeah, I am very happy when I do debugging sessions; not sure about what you would feel).

Your IDE and your debugger are your best friends when you are learning new concepts. I find Jupyter Notebooks pretty cumbersome for jumping around, looking up docs and for debugging. But that’s my personal opinion.

But the course videos can get very bland and you won’t want to absorb anything. If that’s the case, stop the video and start the programming assignments straight away. I sometimes find that really helpful since it gives me a better motivation to why I should learn what the course video was blabbering about. Combine this with reading the textbook which I will mention below.

Have a Textbook Lying Around (and this will help you a lot!)

Textbooks are boring. I get it. But sometimes, they are the ones which can give you some comfort in the sea of online articles. My go-to textbook for Reinforcement Learning is Reinforcement Learning: An Introduction by Sutton and Barto. This will not be surprising to you if you have ever searched for a Reinforcement Learning textbook and it is the go-to textbook for most university courses.

Sutton and Barto did a fantastic job writing such a great textbook. I find it quite enjoyable to read and to look up stuff which I want to know. In fact, I would even highly recommend you to read the first chapter of the textbook to have a very gentle introduction to Reinforcement Learning. I find it better than any other online tutorial or medium post.

Another really good thing about this textbook is, even when learning from Coursera course, I sometimes find reading the textbook helping me a lot more than than the course videos themselves. This is somewhat strange since most of the time it is the other way around. So, what I do is I go back and forth between the textbook and the course videos to fill in my knowledge gaps. Then I try out programming assignments to really check whether I understand the technical details of the algorithms.

Learn by coding, not just by reading

When I started diving into the world of Reinforcement Learning I was always confused with the connections among “Value function”, “Q value”, “Optimal Policy” and “Policy”. Trust me, those concepts will become as clear as daylight right after you have implemented and used them to train your agents. Read the text, watch course videos, implement the functions, run, debug, repeat.

Playing around

While you are doing that Coursera course (preferably after you have finished week 3 of the course and you have an idea of what Q-Learning is about), take a look at Lex Fridman’s lecture on Deep Reinforcement Learning. It is not technical but now, you would have a better understanding of what the Q-learning part of the slides is all about. The thing about Reinforcement Learning is that if you Google certain concepts when you need to know them, you will retain the knowledge for a while but if you don’t have a deep understanding of what those do underneath, you will always be confused. That’s one of the reasons I suggest you to check out those lectures after understanding the basic concepts well enough. Then, try out Deep Traffic. There are a couple of parameters to play around and if you are not sure of what those mean, check out its documentation and read the paper to get a better idea of why certain parameters help. Then, go try out Karpathy’s Deep Q-Learning Demo. By now, you should be quite familiar with various hyperparameters.

Trying a couple of random parameters and getting good results is fun but don’t forget to learn the “Why” behind your changes. [DeepTraffic]

Parameters are brittle but check for typos first!

As you start to play around with Reinforcement Learning problems, you will start to realize how brittle the parameters are. Tuning your epsilon to a particular number to have enough exploration done before your agent starts exploiting is as important as setting up an exact architecture with exact parameters for your DQN network. All this can make you think that if your agent is not doing a good job, you haven’t tuned all those pesky hyperparameters well enough. But more often than not, you may have a typo somewhere in your code. You may have mistakenly passed the current state instead of the next state when you are updating your Q values. So, always check your code first before you spend your entire day tuning a single parameter without getting any good results.

Go Broad

Once you have got a good hang of basic reinforcement learning concepts, start following lectures from UC Berkeley Deep Reinforcement Learning course and David Silver’s lectures on Reinforcement Learning. These are good to reiterate what you have learnt and to make sure you still can follow despite slight changes in notations and such (we see that a lot in Machine Learning literature as well; people using ever so slightly different notations just to get your more confused!).

You should start reading the seminal paper on DQN now that you have a good understanding of basics of Reinforcement Learning. Jumping right into Deep Reinforcement Learning is not advisable if you only understand Deep Learning part and not the Reinforcement Learning part. That’s one major fallacy of folks who are pretty well versed in Deep Learning but have no idea what Reinforcement Learning is about.

Equipped with basic Reinforcement Learning knowledge, you can start reading various Deep Reinforcement Learning papers (and start implementing them). You will have some knowledge gaps on certain concepts but you should already have core concepts in your toolbox and learning additional techniques is not that hard anymore. My personal technique is to use a mind mapping software to map out concepts and papers (described Newbie’s Guide to Deep Learning).

Other resources