A Tutorial on Finite-Sample Guarantees of Contractive Stochastic Approximation With...

Channel:

Simons Institute for the Theory of Computing

Subscribers:

68,700

Published on October 11, 2022 10:06:02 AM ● Video Link: https://www.youtube.com/watch?v=OV1_TIz-l6Y

Category:

Tutorial

Duration: 1:00:50

589 views

Siva Theja Maguluri (Georgia Institute of Technology)
https://simons.berkeley.edu/node/22741
Structure of Constraints in Sequential Decision-Making

Reinforcement learning (RL) is a learning paradigm for large-scale sequential decision making problems in complex stochastic systems. Many modern RL algorithms solve the underlying Bellman fixed point equation using Stochastic Approximation (SA). This two-part tutorial presents an overview of our results on SA, and illustrate how they can be used to obtain sample complexity results of a large class of RL algorithms.

Part I of the tutorial focuses on SA, a popular approach for solving fixed point equations when the information is corrupted by noise. We consider a type of SA algorithms for operators that are contractive under arbitrary norms (especially the l-infinity norm). We present finite sample bounds on the mean square error, which are established using a Lyapunov framework based on infimal convolution and generalized Moreau envelope. We then present our recent result on exponential concentration of the tail error, even when the iterates are not bounded by a constant. These tail bounds are obtained using exponential supermartingales in conjunction with the Moreau envelop and bootstrapping.

Part II of the tutorial focuses on RL. We briefly illustrate the connection between RL algorithms and SA of contractive operators, and highlight the importance of the infinity norm. We then exploit the results from Part I, to present finite sample bounds of various RL algorithms including on policy and off policy algorithms, both in tabular and linear function approximation settings.

Other Videos By Simons Institute for the Theory of Computing

2022-10-14	The Power of Adaptivity in Representation Learning: From Meta-Learning to Federated Learning
2022-10-14	When Matching Meets Batching: Optimal Multi-stage Algorithms and Applications
2022-10-13	Optimal Learning for Structured Bandits
2022-10-13	Dynamic Spatial Matching
2022-10-13	New Results on Primal-Dual Algorithms for Online Allocation Problems With Applications to ...
2022-10-12	Learning Across Bandits in High Dimension via Robust Statistics
2022-10-12	Are Multicriteria MDPs Harder to Solve Than Single-Criteria MDPs?
2022-10-12	A Game-Theoretic Approach to Offline Reinforcement Learning
2022-10-11	The Statistical Complexity of Interactive Decision Making
2022-10-11	A Tutorial on Finite-Sample Guarantees of Contractive Stochastic Approximation With...
2022-10-11	A Tutorial on Finite-Sample Guarantees of Contractive Stochastic Approximation With...
2022-10-11	Stochastic Bin Packing with Time-Varying Item Sizes
2022-10-10	Constant Regret in Exchangeable Action Models: Overbooking, Bin Packing, and Beyond
2022-10-08	On The Exploration In Load-Balancing Under Unknown Service Rates
2022-10-08	Sample Complexity Of Policy-Based Methods Under Off-Policy Sampling And ...
2022-10-08	The Compensated Coupling (or Why the Future is the Best Guide for the Present)
2022-10-08	Higher-Dimensional Expansion of Random Geometric Complexes
2022-10-08	On the Power of Preconditioning in Sparse Linear Regression
2022-10-07	What Functions Do Transformers Prefer to Represent?
2022-10-01	Optimality of Variational Inference for Stochastic Block Model
2022-10-01	Machine Learning on Large-Scale Graphs

Tags:

Simons Institute

theoretical computer science

UC Berkeley

Computer Science

Theory of Computation

Theory of Computing

Structure of Constraints in Sequential Decision-Making

Siva Theja Maguluri

Channel	Latest
RoninRevil	6 hours ago
Wos	6 hours ago
MOMOKO YODA	6 hours ago
iGuti89	6 hours ago
Pierro_fps	6 hours ago
Dragomazing	7 hours ago
Sport Piceno Game	7 hours ago
Thích Violin	7 hours ago
Der Mikeintosh	7 hours ago
UltimateNyde	7 hours ago
Nexific	7 hours ago
KevGaming87	7 hours ago
Liban Ali	7 hours ago
Reborn Project	7 hours ago
Mokka Commentry	7 hours ago
CARBON	7 hours ago
SkyWhait	7 hours ago
Lostgamerrus	8 hours ago
Crouch Gaming	8 hours ago
RayThaGawd	8 hours ago
Schannel	8 hours ago
la cueva de lobo	8 hours ago
Geezax	8 hours ago
Nubo BIT	8 hours ago
Inter	8 hours ago