Policy Optimization as Predictable Online Learning Problems: Imitation Learning and Beyond

Subscribers:
344,000
Published on ● Video Link: https://www.youtube.com/watch?v=RW9nAj8tCls



Duration: 1:10:13
1,284 views
12


Efficient policy optimization is fundamental to solving real-world reinforcement learning problems, where agent-environment interactions can be costly. In this talk, I will discuss my recent research toward improving policy optimization efficiency from the perspective of online learning. The use of online learning to analyze policy optimization was pioneered by Ross et al. who proposed to reduce imitation learning to adversarial online learning problems. However, as I will discuss, this reduction actually loses information: the policy optimization problem is not truly adversarial but rather predictable from past information. Based on this observation, I will present conditions for the last-iterate convergence of value aggregation for imitation learning. Furthermore, I will show how one can leverage this predictable information to design better algorithms to speed up imitation learning and reinforcement learning.

View slides and more at https://www.microsoft.com/en-us/research/video/policy-optimization-as-predictable-online-learning-problems-imitation-learning-and-beyond/




Other Videos By Microsoft Research


2018-12-13Chasing convex bodies and other random topics with Dr. Sébastien Bubeck
2018-12-06Automated Reasoning of Database Queries
2018-12-06How to Obtain and Run Light and Efficient Deep Learning Networks
2018-12-06Machine Teaching Demo
2018-12-06Advanced Machine Learning Day 3: Neural Program Synthesis
2018-12-06Advanced Machine Learning Day 3: Neural Architecture Search
2018-12-06Delayed Impact of Fair Machine Learning
2018-12-03Machine learning and the learning machine with Dr. Christopher Bishop
2018-12-03Deep Generative Models for Imitation Learning and Fairness
2018-11-29Machine Teaching Overview
2018-11-28Policy Optimization as Predictable Online Learning Problems: Imitation Learning and Beyond
2018-11-28Algorithmic Social Intervention
2018-11-26TLA+ Specifications of the Consistency Guarantees Provided by Cosmos DB
2018-11-21The 20th Northwest Probability Seminar: Cutoff for Product Replacement on Finite Groups
2018-11-21The 20th Northwest Probability Seminar: The KPZ Fixed Point
2018-11-20Stochastic Explosions in Branching Processes and Non-uniqueness for Nonlinear PDE
2018-11-20The 20th Northwest Probability Seminar: First Order Logic on Galton-Watson Trees
2018-11-20Causal Effects and Overlap in High-dimensional or Sequential Data
2018-11-20Stochastic Approximation and Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms
2018-11-20Towards a Conscious AI: A Computer Architecture inspired by Neuroscience
2018-11-19Fireside Chat with Manuel Blum



Tags:
microsoft research