A Tutorial on Finite-Sample Guarantees of Contractive Stochastic Approximation With...

Published on ● Video Link: https://www.youtube.com/watch?v=OV1_TIz-l6Y



Category:
Tutorial
Duration: 1:00:50
589 views
14


Siva Theja Maguluri (Georgia Institute of Technology)
https://simons.berkeley.edu/node/22741
Structure of Constraints in Sequential Decision-Making

Reinforcement learning (RL) is a learning paradigm for large-scale sequential decision making problems in complex stochastic systems. Many modern RL algorithms solve the underlying Bellman fixed point equation using Stochastic Approximation (SA). This two-part tutorial presents an overview of our results on SA, and illustrate how they can be used to obtain sample complexity results of a large class of RL algorithms.

Part I of the tutorial focuses on SA, a popular approach for solving fixed point equations when the information is corrupted by noise. We consider a type of SA algorithms for operators that are contractive under arbitrary norms (especially the l-infinity norm). We present finite sample bounds on the mean square error, which are established using a Lyapunov framework based on infimal convolution and generalized Moreau envelope. We then present our recent result on exponential concentration of the tail error, even when the iterates are not bounded by a constant. These tail bounds are obtained using exponential supermartingales in conjunction with the Moreau envelop and bootstrapping.

Part II of the tutorial focuses on RL. We briefly illustrate the connection between RL algorithms and SA of contractive operators, and highlight the importance of the infinity norm. We then exploit the results from Part I, to present finite sample bounds of various RL algorithms including on policy and off policy algorithms, both in tabular and linear function approximation settings.




Other Videos By Simons Institute for the Theory of Computing


2022-10-14The Power of Adaptivity in Representation Learning: From Meta-Learning to Federated Learning
2022-10-14When Matching Meets Batching: Optimal Multi-stage Algorithms and Applications
2022-10-13Optimal Learning for Structured Bandits
2022-10-13Dynamic Spatial Matching
2022-10-13New Results on Primal-Dual Algorithms for Online Allocation Problems With Applications to ...
2022-10-12Learning Across Bandits in High Dimension via Robust Statistics
2022-10-12Are Multicriteria MDPs Harder to Solve Than Single-Criteria MDPs?
2022-10-12A Game-Theoretic Approach to Offline Reinforcement Learning
2022-10-11The Statistical Complexity of Interactive Decision Making
2022-10-11A Tutorial on Finite-Sample Guarantees of Contractive Stochastic Approximation With...
2022-10-11A Tutorial on Finite-Sample Guarantees of Contractive Stochastic Approximation With...
2022-10-11Stochastic Bin Packing with Time-Varying Item Sizes
2022-10-10Constant Regret in Exchangeable Action Models: Overbooking, Bin Packing, and Beyond
2022-10-08On The Exploration In Load-Balancing Under Unknown Service Rates
2022-10-08Sample Complexity Of Policy-Based Methods Under Off-Policy Sampling And ...
2022-10-08The Compensated Coupling (or Why the Future is the Best Guide for the Present)
2022-10-08Higher-Dimensional Expansion of Random Geometric Complexes
2022-10-08On the Power of Preconditioning in Sparse Linear Regression
2022-10-07What Functions Do Transformers Prefer to Represent?
2022-10-01Optimality of Variational Inference for Stochastic Block Model
2022-10-01Machine Learning on Large-Scale Graphs



Tags:
Simons Institute
theoretical computer science
UC Berkeley
Computer Science
Theory of Computation
Theory of Computing
Structure of Constraints in Sequential Decision-Making
Siva Theja Maguluri