Reinforcement learning is a pretty complex topic to wrap your head around, as far as intellectual pursuits go. It’s also one of the hottest areas of AI research: MIT Technology Review picked it as one of the top 10 technologies of 2017. Reinforcement learning chalked up one of the flashiest wins for AI this decade in March 2016, when DeepMind AlphaGo beat world championship player Lee Sedol at the game Go. Sedol, after his 4-1 loss to AlphaGo said that it felt like he was playing against an alien intelligence. Many AI researchers consider reinforcement learning, or RL in short, to be the path that will help humanity scale its highest summit: artificial general intelligence. (Park away in your mind artificial general intelligence, or AGI, for later in this story.) No wonder then, that it’s on a league of its own, in terms of ambition and hype.

In a quest to demystify this tremendously arcane area of AI research, FactorDaily spoke to Professor Balaraman Ravindran, an IIT Madras professor, who is widely considered to be India’s foremost reinforcement learning expert. Based on a previous data story at FactorDaily in which we attempted to map out the country’s leading machine learning researchers, this assessment rings true. In any case, we wanted to ask him if he would be okay to be referred to as such.

“That I agree, because I am the only one,” he says, laughing. “That was said half jokingly, but I have been working on RL (reinforcement learning) since I came back (from the US)… even before that,” he says. Ravindran did his Ph.D. research at University of Massachusetts, Amherst, mentored by Prof. Andrew G. Barto, a founding father of RL.

AI researchers like Ravindran are a rarity in India – there were less than 400, according to the Global AI Talent Report 2018, especially in light of the Indian government’s growing interest and ambitions in deploying AI across various sectors, from agriculture, education, healthcare, and defence. This year, we’ve heard of instances of AI being deployed to find 3,000 missing children and to improve hygiene in railway trains, for example.

Ravindran’s academic work, which spans across over two decades, has produced 170 research papers, 12 of them in 2018, according to his Google Scholar profile. He’s also a co-organiser of dozens of data science and AI-focused conferences, seminars and workshops in India and the world. We spoke to him about his journey into reinforcement learning, and felt right at home quizzing him on a wide variety of questions about AI – whether we’re an AI bubble, democratisation of AI, its comparisons to alchemy, and more.

To be sure, there are other AI experts in India – including the likes of Soumen Chakrabarti, Professor, IIT Bombay; Rajeev Rastogi, Director, Machine Learning, Amazon; and Nikhil R. Pal from Indian Statistical Institute, Kolkata – but Ravindran is widely seen in a unique spot given his eminence in RL.

Into AI before it was cool

Prof Ravindran’s enthusiasm for AI began in the late 80s, during his undergrad days in Madurai’s Thiyagarajar College, where he was studying electronics and communications engineering. Among his earliest forays into this field was a book, titled ‘Artificial Intelligence’ that his father S B Raman, gave him from a trip to Singapore. His father was an entrepreneur who made rubber components for auto companies and imported rubber from Malaysia. He read it during the first year of his college, and it sparked his interest in the field. “It’s by Rich and Knight, it’s a very classic, old school textbook,” he says.

“I was always fascinated about intelligence. How do humans think? What does it mean to be intelligent? I wanted to do neural networks, because neural networks was originally motivated a lot by biology, neuroscience. The whole idea was to try and explain how the brain functions,” he says.

Humans, as we know them today, are about 200,000 years old but humankind – and man’s brain – has roots going back much earlier; the earth is estimated to be some 4.5 billion years old.

A sharper, more focused interest in reinforcement learning, happened after he read a neuroscience paper by Read Montague. “He had started looking at how primates learn and then he had some very interesting results on how a certain neuro-modulator in the brain called dopamine – how dopamine varies as primates learn,” reflects Ravindran. “He had done all these monkey experiments and then he had shown that that there is this mathematical learning model, called temporal difference learning, that can explain how dopamine varies in the brain.”

Ravindran found a journeyman in his master’s thesis advisor at the Indian Institute of Science, Bangalore. Sathiya Keerthi and he embarked on writing a survey paper on reinforcement learning. “I’m really thankful to an advisor who was willing to embark on a reading program himself so that he could advise me on my master’s thesis. We don’t come across people like that,” says the IIT-M professor. The survey paper grew so popular that Oxford University Press invited them to write a chapter in their handbook on neural computation. That paper also vaulted Ravindran into Amherst, the birthplace of RL and where the first set of papers on the topic were published by his advisor Barto.

“What got me into RL was its relationship to neuroscience, and what has kept me in RL is its relationship to psychology,” the professor says, recounting his journey. He continues to explore the connections between RL on one side and behavioural and cognitive psychology on the other, which relate to things like memory, hierarchies, and representation. “How can I do something like that with an artificial agent? So those are the kinds of questions that keeps me excited,” he says.

Braving an AI winter

“AI was kind of on its way out,” he says at a meeting in his office in IIT Madras, recounting the state of AI during the mid-90s. It was an AI winter, specifically, for the symbolic and connectionist sides of AI, he says, which begged a follow-up question: “So you know the difference between symbolic and connectionist?” The professor quizzed us.

Thanks to our prior reading of the ‘The Master Algorithm’, we could say yes. The five tribes of machine learning, as defined by Pedro Domingos in his book are symbolists, connectionists, evolutionaries, bayesians, and analogisers. Each have distinct origins and algorithmic techniques – connectionist methods have their origins in neuroscience, while symbolists have origins in logic and philosophy. “The connectionists are people who kind of vectorise everything, they operate in vector spaces. The symbolic people are those who like to associate abstractions with different concepts,” Ravindran explains.

After he finished his Ph.D. from the US in 2004, he was looking for jobs, and found an opening at IIT Madras. He has worked there ever since. “When I joined my basic pay was Rs 12,000. That’s changed a lot. We don’t get industry salaries but we get decent money,” Ravindran says.

He is a third generation teacher – his maternal grandfather S Banumoorthy was a professor of English and his mother B Gomathi was a professor of economics. They had a big influence in his decision to become a teacher. “I knew this from when I was five years old. I never ever considered other career options, I always wanted to be a teacher,” he says.

“Everybody has bad days in their profession, of course, but by and large, being a teacher as been very, very rewarding. I know that I have had impact in many students lives… there are people who come back to me after 10-12 years seeking advice.” Sure enough, we spot a wedding invitation from one of his former students on his desk.

Apart from images of Swami Vivekananda, Ravindra’s office has two other visuals that stand out. One, a charcoal portrait of him – drawn “out of memory” by Anuran Mohanty, a former student. The second is a portrait of Alan Turing, signed by several Turing Award winners. “My student – Sarath Chander – took this picture when we went to the Turing centenary celebrations (in 2012) and got it autographed by the Turing Award winners who showed up for that event,” he says. The Turing award is the Nobel prize for computing. Chander, now at the University of Montreal, is being advised by two titans from the field of AI research: Yoshua Bengio and Hugo Larochelle.

Several of Ravindran’s former students are now working for tech giants such as Microsoft, Google, IBM, and Amazon, among others.

A student talks of Ravindran’s ‘Introduction to Machine Learning’ course. “It was his unique teaching style that got me and a bunch of my friends hooked to this topic and field – his enthusiasm towards the material, the intuitive examples that he gives…,” says Abhishek Naik, a student pursuing a dual degree from IIT Madras, who recently did a bulk of work on MADRaS, an open source multi-agent driving simulator. “Working with him is highly rewarding in the sense that after every meeting, you’ll walk out out his office brimming with new ideas and directions to explore.” Naik credits this not just to Ravindran’s knowledge of RL but also the cognitive-psychological motivations behind those ideas.

So, what is RL anyway?

Machine learning approaches are categorised into three types: supervised, unsupervised, and RL. All of machine learning is solving some optimisation problem or the other, based on the constraints under which one is operating, explains Ravindran. “Most of what machine learning tries to do is to learn some kind of a pattern. Most learning paradigms will be something like: you are given a set of inputs and what the corresponding output should be,” he says. “The search technique that you use for finding solutions keep changing. If it’s a decision tree, you typically are not using gradient approaches, if it’s a SVM (support vector machine), you’re using convex optimisation techniques, and if it is a neural network, you’re using a gradient approach.”

Decision trees, support vector machines, and neural networks are some of the popular machine learning algorithmic techniques used by data scientists. A decision tree is one of the most rudimentary methods: a bunch of if-else statements to define patterns in data, SVMs are are used in classification and regression tasks and neural networks are more complex implementations of what is powering the current deep learning boom in AI with a variety of use-cases in computer vision, speech recognition, and more. Convex optimisation and gradient descent are optimisation techniques while using these methods.

But there is another class of problems in which it is very hard to specify what makes for a correct output. Ravindran tells you with the example of learning how to cycle. “Nobody tells you (how)… the angle in which your cycle is tilting, how much you are moving forward, the speed at which the wind is blowing or to push down with your right leg,” Ravindran says. “But you have some feedback: if you fall down, it hurts. If you ride properly, your parents will be standing by, clapping in encouragement. You are not given instructions, but you are being evaluated.”

RL is all about learning from this kind of evaluation, and trial and error, as opposed to learning from instructions. RL problems can be solved using different techniques: popular methods include Q-learning (used to learn the value of an action), policy gradient approaches, and DDPGs (deep deterministic policy gradients). Policy gradient methods are used to optimise the policy of a reinforcement learning agent, towards the goal of maximising rewards, and DDPGs are a deep learning implementation of the same. DDPG methods are becoming more and more successful and are used in robotics control problems, as people have found ways to get them to work, he says.

For those who want a deep dive into the subject, we recommend his course on NPTEL spanning 60 chapters and that gets into the nitty gritties such as Semi Markov Decision Processes, Q-learning, Thompson sampling, and lots more.

His Greatest Hits

As a part of his Ph.D. work on RL, Ravindran introduced this notion called SMDP (Semi-Markov Decision Process) homomorphisms. SMDP is a decision-making framework typically used in RL problems.

“One of the things that I had looked at was how can you break down a very complex problem into simpler problems and solve the smaller problems,” says Ravindran. “Then I started asking this question: when do I say that problem A is similar to problem B? I also introduced this notion of a symmetry in mathematical formalism, for what you mean by symmetry in a reinforcement learning problem.”

Ravindran has around 400 citations across the different papers he has written on symmetry. “It’s decent, not earth shaking, but certainly enough of contributions that people still know me for my homomorphism work. So, if I go to an RL meeting, people remember me as the guy who introduced MDP homomorphisms,” he says.

These days, his research work on RL at IIT Madras spans multiple areas. “We’ve been doing a lot of work on learning structure in reinforcement learning problems,” he says, citing a use-case in autonomous driving. “So, there we try to learn from trajectories taken by other users. I’m not going to learn completely from raw reinforcement. That will mean that every agent that I have should get into the car, run over a few pedestrians and then learn that running over pedestrians is bad. Or run into the wall a couple of times. That’s nonsense. Why would I do that? Instead, I’m going to have somebody telling me explicitly that when you’re close to the wall, steering into the wall is bad. That’s not a complete instruction. I’m not teaching you how to drive, but I’m telling you what are the bad things you should avoid,” he says.

Ravindran’s research work also revolves around making useful tweaks on existing deep RL algorithms. “One simple tweak that we did was to say that every time you pick an action, don’t just pick an action, also pick a duration for the action, so that (with) every time-step I would change my action. We call it FIGAR – fine grained action repetition. And it turns out, that makes learning much faster and it will also learn much better policies as well.”

RL Use-Cases

RL has had notable success in playing games – Go being a recent example, while chess, checkers, solitaire, and backgammon count as earlier successes. Ravindran attributes it to the fact that it’s easy to get a lot of repeated experience at playing them, making it easier to optimise and tune parameters.

“It doesn’t mean RL has worked only in the game domains. I’m working with a colleague in management studies to use RL in risk modelling for lending,” the professor says. Other notable real world examples include autonomous helicopter control, optimising power consumption in data centers, and robot soccer. In a survey of 95 machine learning problems ranging across sectors such as marketing, finance, and IoT, Brandon Rohrer, data scientist at Facebook lists nine problems where RL algorithms would be the first choice.

If machine learning is difficult, RL and, more specifically, deep RL – yes, there is something called deep RL – is ridiculously hard. “It’s incredibly hard because it not only faces all of the problems of deep learning – sample efficiency, generalisation, and reproducibility, for example – but most solutions proposed in the deep learning literature for these problems also do not work here because of additional challenges (for instance, temporal correlation of data). Hence, there is a requirement of RL-specific techniques like experience replay, target networks etc.,” says Naik, the Ravindran student.

“Despite these byzantine challenges, I strongly believe that RL is a key component towards achieving the coveted artificial general intelligence,” Naik contends. Most of what we refer to as AI today is ANI (Artificial Narrow intelligence), i.e. a specialist at one task. A chatbot, for example, cannot ride a bicycle. Artificial General Intelligence, or AGI, which hasn’t been achieved yet, aims to create intelligent systems that can perform a wide range of cognitive functions, reason and improve at tasks, like humans do. “Because RL is inspired by how humans learn, when coupled with various techniques, it’s in a great place for learning to perform several different tasks, from only partial observations of the environment via trial-and-error,” Naik adds.

“There are also a few solutions, mainly of robotic control where I have seen very promising work in reinforcement learning. That’s one area where the value created today seems very disproportionate to the amount of PR that this field is generating,” Andrew Ng, Coursera co-founder and former head of head of Baidu AI Group and Google Brain, said in a recent talk at Intel AI Devcon 2018. “To be really technical, if any of you know what model-based reinforcement learning is, that’s where I would place my bet in terms of RL applications. Because model-based RL, where you build a simulator, lets you get a lot of data,” Ng added. (Ng is pronounced like “ing” in English.)

The Big Questions

As a machine learning practitioner who has spent a few decades in this field, Ravindran agrees that AI is now at the peak of its hype cycle, owing to a few breakthroughs this decade. The ImageNet Challenge, which saw a deep learning breakthrough in 2012, was followed by a surge in investments into AI companies. There were also successes in the language embedding space, and speech recognition – all problems that the AI community considered hard to solve.

“There’s been a gold rush mentality post that. It’s a bubble. It has to burst. I don’t think there is reality behind a lot of the hype,” Ravindran says. “Saying that AI is this panacea, that is going to solve all your problems… is a little premature. There would come an adjustment of expectations somewhere down the line.”

That said, he welcomes the attention toward AI, as there’s more of a openness among people to explore AI-based techniques now. “It’s largely because of the hype. But we can take it and build success stories out of it. You have to identify the areas in which AI has had tremendous success, has tremendous growth potential, and rightly invest there.”

How far away are we from AGI or artificial general intelligence? Ravindran’s optimistic answer to that is that it will happen in his lifetime – or the next 40 years. At present, machines don’t have the intelligence capability of a human two-year-old, he says. “We’re a long way off before we can get to something like human-level AI capabilities but there are certain pockets where we are already achieving superhuman performance. Those pockets in which we are getting this kind of amazing performance is going to keep growing,” he predicts.

Democratisation of AI

While tools such as Google’s Tensorflow have democratised AI, access to data sets has not yet been democratised, says Ravindran. Companies such as “Google, Microsoft and IBM can solve a lot more interesting problems than we can mainly because we don’t have access to data,” Ravindran says. Data in fact, is now a bigger carrot for academic researchers than salaries that they can stand to make by working at a tech giant, he says. “It’s really hard to keep people on in the universities.”

Ali Rahimi, AI researcher at Google, at his acceptance speech at the Conference and Workshop on Neural Information Processing Systems (NIPS), 2017 offered an alternative metaphor to puncture the assertion that AI is the new electricity. Machine learning has become alchemy, Rahimi posited.

Ravindran concurs. “Perfect analogy. Completely agree with him,” he says, when asked about Rahimi’s statement.

“Whenever we say alchemy, we think of iron turning to gold. But that’s not the only part of alchemy. All kinds of dyes and (even) metallurgy came from alchemy. At the time, for the problem they were trying to solve, alchemy worked, but because it worked, it delayed the advent of true science,” says Ravindran, as the deep thinker in him surfaces.

“AI is like alchemy, helping us solve immediate problems. But it is also preventing true understanding of intelligence… it’s not true science yet,” he says.

When the history of AI in India is written, that will be a quote that will echo in the community.