REINFORCEMENT LEARNING AND OPTIMAL CONTROL

BOOK, VIDEOLECTURES, AND COURSE MATERIAL, 2019

Dimitri P. Bertsekas

REINFORCEMENT LEARNING COURSE AT ASU: SLIDES AND VIDEO LECTURES

Lecture slides for a course in Reinforcement Learning and Optimal Control (January 8-February 21, 2019), at Arizona State University:

Slides-Lecture 1, Slides-Lecture 2, Slides-Lecture 3, Slides-Lecture 4, Slides-Lecture 5, Slides-Lecture 6, Slides-Lecture 7, Slides-Lecture 8, Slides-Lecture 9, Slides-Lecture 10, Slides-Lecture 11, Slides-Lecture 12, Slides-Lecture 13.

Videos of lectures from Reinforcement Learning and Optimal Control course at Arizona State University: (Click around the screen to see just the video, or just the slides, or both simultaneously).

Video-Lecture 1, Video-Lecture 2, Video-Lecture 3,Video-Lecture 4, Video-Lecture 5, Video-Lecture 6, Video-Lecture 7, Video-Lecture 8, Video-Lecture 9, Video-Lecture 10, Video-Lecture 11, Video-Lecture 12, Video-Lecture 13.

Lecture 13 is an overview of the entire course.

REINFORCEMENT LEARNING SURVEYS: VIDEO LECTURES AND SLIDES

Video of an Overview Lecture on Distributed RL from IPAM workshop at UCLA, Feb. 2020 (Slides).

Slides for an extended overview lecture on RL: Ten Key Ideas for Reinforcement Learning and Optimal Control.

REINFORCEMENT LEARNING AND OPTIMAL CONTROL BOOK, Athena Scientific, July 2019

The book is available from the publishing company Athena Scientific, or from Amazon.com.

Click here for an extended lecture/summary of the book: Ten Key Ideas for Reinforcement Learning and Optimal Control.

The purpose of the book is to consider large and challenging multistage decision problems, which can be solved in principle by dynamic programming and optimal control, but their exact solution is computationally intractable. We discuss solution methods that rely on approximations to produce suboptimal policies with adequate performance. These methods are collectively referred to as reinforcement learning, and also by alternative names such as approximate dynamic programming, and neuro-dynamic programming.

Our subject has benefited enormously from the interplay of ideas from optimal control and from artificial intelligence. One of the aims of this monograph is to explore the common boundary between these two fields and to form a bridge that is accessible by workers with background in either field.

The mathematical style of the book is somewhat different from the author's dynamic programming books, and the neuro-dynamic programming monograph, written jointly with John Tsitsiklis. We rely more on intuitive explanations and less on proof-based insights. Still we provide a rigorous short account of the theory of finite and infinite horizon dynamic programming, and some basic approximation methods, in an appendix. For this we require a modest mathematical background: calculus, elementary probability, and a minimal use of matrix-vector algebra.

The methods of this book have been successful in practice, and often spectacularly so, as evidenced by recent amazing accomplishments in the games of chess and Go. However, across a wide range of problems, their performance properties may be less than solid. This is a reflection of the state of the art in the field: there are no methods that are guaranteed to work for all or even most problems, but there are enough methods to try on a given challenging problem with a reasonable chance that one or more of them will be successful in the end. Accordingly, we have aimed to present a broad range of methods that are based on sound principles, and to provide intuition into their properties, even when these properties do not include a solid performance guarantee. Hopefully, with enough exploration with some of these methods and their variations, the reader will be able to address adequately his/her own problem.

BOOK PREFACE, CONTENTS, SELECTED SECTIONS

Click here for preface and table of contents.

Selected sections from book chapters:

Chapter 1: Exact Dynamic Programming,

Chapter 2: Approximation in Value Space,

Errata

RELATED RESEARCH PAPERS AND REPORTS

The following papers and reports have a strong connection to material in the book, and amplify on its analysis and its range of applications.

Bertsekas, D., "Multiagent Reinforcement Learning: Rollout and Policy Iteration," ASU Report Sept. 2020; to be published in IEEE/CAA Journal of Automatica Sinica.

Bertsekas, D., "Multiagent Value Iteration Algorithms in Dynamic Programming and Reinforcement Learning," arXiv preprint, arXiv:2005.01627, April 2020.

Bertsekas, D., "Multiagent Rollout Algorithms and Reinforcement Learning," arXiv preprint arXiv:1910.00120, September 2019 (revised April 2020).

Bertsekas, D., "Constrained Multiagent Rollout and Multidimensional Assignment with the Auction Algorithm," arXiv preprint, arXiv:2002.07407 February 2020.

Bhattacharya, S., Badyal, S., Wheeler, W., Gil, S., Bertsekas, D.,"Reinforcement Learning for POMDP: Partitioned Rollout and Policy Iteration with Application to Autonomous Sequential Repair Problems," IEEE Robotics and Automation Letters, to appear, 2020.

D. P. Bertsekas, "Multiagent Rollout Algorithms and Reinforcement Learning," arXiv preprint arXiv:1910.00120, September 2019.

D. P. Bertsekas, "Biased Aggregation, Rollout, and Enhanced Policy Improvement for Reinforcement Learning," Lab. for Information and Decision Systems Report, MIT, October 2018; a shorter version appears as arXiv preprint arXiv:1910.02426, Oct. 2019. Dynamic Programming and Optimal Control, Vol. II, 4th Edition: Approximate Dynamic Programming Dimitri P. Bertsekas Published June 2012 The fourth edition of Vol. II of the two-volume DP textbook was published in June 2012. This is a major revision of Vol. II and contains a substantial amount of new material, as well as a reorganization of old material. The length has increased by more than 60% from the third edition, and most of the old material has been restructured and/or revised. Volume II now numbers more than 700 pages and is larger in size than Vol. I. It can arguably be viewed as a new book! Approximate DP has become the central focal point of this volume, and occupies more than half of the book (the last two chapters, and large parts of Chapters 1-3). Thus one may also view this new edition as a followup of the author's 1996 book "Neuro-Dynamic Programming" (coauthored with John Tsitsiklis). A lot of new material, the outgrowth of research conducted in the six years since the previous edition, has been included. A new printing of the fourth edition (January 2018) contains some updated material, particularly on undiscounted problems in Chapter 4, and approximate DP in Chapter 6. References were also made to the contents of the 2017 edition of Vol. I, and to high profile developments in deep reinforcement learning, which have brought approximate DP to the forefront of attention. Dynamic Programming and Optimal Control, Vol. II: Approximate Dynamic Programming, ISBN-13: 978-1-886529-44-1, 712 pp., hardcover, 2012 CHAPTER UPDATE - NEW MATERIAL Click here for an updated version of Chapter 4, which incorporates recent research on a variety of undiscounted problem topics, including

Deterministic optimal control and adaptive DP (Sections 4.2 and 4.3).

Stochastic shortest path problems under weak conditions and their relation to positive cost problems (Sections 4.1.4 and 4.4).

D. P. Bertsekas, "Value and Policy Iteration in Deterministic Optimal Control and Adaptive Dynamic Programming", Lab. for Information and Decision Systems Report LIDS-P-3174, MIT, May 2015 (revised Sept. 2015); IEEE Transactions on Neural Networks and Learning Systems, Vol. 28, 2017, pp. 500-509.

D. P. Bertsekas and H. Yu, "Stochastic Shortest Path Problems Under Weak Conditions", Lab. for Information and Decision Systems Report LIDS-P-2909, MIT, January 2016.

D. P. Bertsekas, "Robust Shortest Path Planning and Semicontractive Dynamic Programming," Lab. for Information and Decision Systems Report LIDS-P-2915, MIT, Feb. 2014 (revised Jan. 2015 and June 2016); arXiv preprint arXiv:1608.01670; Naval Research Logistics (NRL), 66(1), pp.15-37.

D. P. Bertsekas, "Affine Monotonic and Risk-Sensitive Models in Dynamic Programming", Lab. for Information and Decision Systems Report LIDS-3204, MIT, June 2016; arXiv preprint arXiv:1608.01393; IEEE Transactions on Aut. Control, Vol. 64, 2019, pp. 3117-3128.