Stochastic Approximation and Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms

Channel:

Subscribers:

344,000

Published on November 20, 2018 4:07:27 PM ● Video Link: https://www.youtube.com/watch?v=KDJXKfpfaJA

Duration: 1:04:54

5,630 views

Stochastic approximation algorithms are used to approximate solutions to fixed point equations that involve expectations of functions with respect to possibly unknown distributions. Among many algorithms in machine learning, reinforcement learning algorithms such as TD- and Q-learning are two of its most famous applications.

This talk will provide an overview of stochastic approximation, with focus on optimizing the rate of convergence. Based on this general theory, the well known slow convergence of Q-learning is explained: the variance of the algorithm is typically infinite. Three new Q-learning algorithms are introduced to dramatically improve performance: (i) The Zap Q-learning algorithm that has provably optimal asymptotic variance, and resembles the Newton-Raphson method in a deterministic setting (ii) The PolSA algorithm that is based on Polyak'smomentum technique, but with a specialized matrix momentum, and (iii) The NeSA algorithm based on Nesterov's acceleration technique.

Analysis of (ii) and (iii) require entirely new analytic techniques. One approach is via coupling: conditions are established under which the parameter estimates obtained using the PolSA algorithm couple with those obtained using the Newton-Raphson based algorithm. Numerical examples confirm this behavior, and the remarkable performance of these algorithms.

See more at https://www.microsoft.com/en-us/research/video/stochastic-approximation-and-reinforcement-learning-hidden-theory-and-new-super-fast-algorithms/

Other Videos By Microsoft Research

2018-12-03	Deep Generative Models for Imitation Learning and Fairness
2018-11-29	Machine Teaching Overview
2018-11-28	Policy Optimization as Predictable Online Learning Problems: Imitation Learning and Beyond
2018-11-28	Algorithmic Social Intervention
2018-11-26	TLA+ Specifications of the Consistency Guarantees Provided by Cosmos DB
2018-11-21	The 20th Northwest Probability Seminar: Cutoff for Product Replacement on Finite Groups
2018-11-21	The 20th Northwest Probability Seminar: The KPZ Fixed Point
2018-11-20	Stochastic Explosions in Branching Processes and Non-uniqueness for Nonlinear PDE
2018-11-20	The 20th Northwest Probability Seminar: First Order Logic on Galton-Watson Trees
2018-11-20	Causal Effects and Overlap in High-dimensional or Sequential Data
2018-11-20	Stochastic Approximation and Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms
2018-11-20	Towards a Conscious AI: A Computer Architecture inspired by Neuroscience
2018-11-19	Fireside Chat with Manuel Blum
2018-11-19	Reinforcement Learning: Bringing Together Computation, Behavior and Neural Coding
2018-11-19	Harnessing Communities of Knowledge: Building an Automated Species Identification Tool
2018-11-14	Hearing in 3D with Dr. Ivan Tashev
2018-11-08	Celebrating 20 years of MSR in Asia with Dr. Hsiao-Wuen Hon
2018-11-01	Soundscape LinkedIn Accessibility Demo [Audio Description]
2018-11-01	Soundscape LinkedIn Accessibility Demo
2018-10-31	MSRNE 10th Anniversary Symposium - Panel: Frontiers of Machine Learning
2018-10-31	MSRNE 10th Anniversary Symposium - Panel: Fairness, Accountability, Transparency, and Ethics

Tags:

microsoft research

Channel	Latest
Haider khalid alnasery	6 hours ago
CBROWN PUBG	7 hours ago
Clubbi	7 hours ago
ASM_txc	8 hours ago
MFGAMING	8 hours ago
Plenoctis - Filosofia del Videogioco	8 hours ago
CS Suomi	8 hours ago
DJ Minecraft	9 hours ago
CHEZAME	10 hours ago
WebNations	10 hours ago
Vorbe	10 hours ago
Tru4Smash	10 hours ago
Furious Phoenix Gamings	10 hours ago
ToonChan TV	10 hours ago
REALAX	10 hours ago
Gamer Shiva	10 hours ago
khuka	10 hours ago
SpeRo	11 hours ago
PENGEN CERITAIN	11 hours ago
Diana Kurnia Putri	11 hours ago
Vodkality	11 hours ago
GIEL	11 hours ago
Sadapin Tekno	11 hours ago
itsRPClips	11 hours ago
ARIESTO GAMING	11 hours ago