Stochastic Approximation and Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms

Subscribers:
344,000
Published on ● Video Link: https://www.youtube.com/watch?v=KDJXKfpfaJA



Duration: 1:04:54
5,630 views
71


Stochastic approximation algorithms are used to approximate solutions to fixed point equations that involve expectations of functions with respect to possibly unknown distributions. Among many algorithms in machine learning, reinforcement learning algorithms such as TD- and Q-learning are two of its most famous applications.

This talk will provide an overview of stochastic approximation, with focus on optimizing the rate of convergence. Based on this general theory, the well known slow convergence of Q-learning is explained: the variance of the algorithm is typically infinite. Three new Q-learning algorithms are introduced to dramatically improve performance: (i) The Zap Q-learning algorithm that has provably optimal asymptotic variance, and resembles the Newton-Raphson method in a deterministic setting (ii) The PolSA algorithm that is based on Polyak'smomentum technique, but with a specialized matrix momentum, and (iii) The NeSA algorithm based on Nesterov's acceleration technique.

Analysis of (ii) and (iii) require entirely new analytic techniques. One approach is via coupling: conditions are established under which the parameter estimates obtained using the PolSA algorithm couple with those obtained using the Newton-Raphson based algorithm. Numerical examples confirm this behavior, and the remarkable performance of these algorithms.

See more at https://www.microsoft.com/en-us/research/video/stochastic-approximation-and-reinforcement-learning-hidden-theory-and-new-super-fast-algorithms/




Other Videos By Microsoft Research


2018-12-03Deep Generative Models for Imitation Learning and Fairness
2018-11-29Machine Teaching Overview
2018-11-28Policy Optimization as Predictable Online Learning Problems: Imitation Learning and Beyond
2018-11-28Algorithmic Social Intervention
2018-11-26TLA+ Specifications of the Consistency Guarantees Provided by Cosmos DB
2018-11-21The 20th Northwest Probability Seminar: Cutoff for Product Replacement on Finite Groups
2018-11-21The 20th Northwest Probability Seminar: The KPZ Fixed Point
2018-11-20Stochastic Explosions in Branching Processes and Non-uniqueness for Nonlinear PDE
2018-11-20The 20th Northwest Probability Seminar: First Order Logic on Galton-Watson Trees
2018-11-20Causal Effects and Overlap in High-dimensional or Sequential Data
2018-11-20Stochastic Approximation and Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms
2018-11-20Towards a Conscious AI: A Computer Architecture inspired by Neuroscience
2018-11-19Fireside Chat with Manuel Blum
2018-11-19Reinforcement Learning: Bringing Together Computation, Behavior and Neural Coding
2018-11-19Harnessing Communities of Knowledge: Building an Automated Species Identification Tool
2018-11-14Hearing in 3D with Dr. Ivan Tashev
2018-11-08Celebrating 20 years of MSR in Asia with Dr. Hsiao-Wuen Hon
2018-11-01Soundscape LinkedIn Accessibility Demo [Audio Description]
2018-11-01Soundscape LinkedIn Accessibility Demo
2018-10-31MSRNE 10th Anniversary Symposium - Panel: Frontiers of Machine Learning
2018-10-31MSRNE 10th Anniversary Symposium - Panel: Fairness, Accountability, Transparency, and Ethics



Tags:
microsoft research