Sample Complexity Of Policy-Based Methods Under Off-Policy Sampling And ...

Published on ● Video Link: https://www.youtube.com/watch?v=9nrG_8ujO7Q



Duration: 30:00
363 views
8


Siva Theja Maguluri (Georgia Institute of Technology)
https://simons.berkeley.edu/talks/sample-complexity-policy-based-methods-under-policy-sampling-and-linear-function-approximation
Joint IFML/Data-Driven Decision Processes Workshop

In this work, we study policy-space methods for solving the reinforcement learning (RL) problem. We first consider the setting where the model is known (MDP setting) and show that the Natural Policy Gradient (NPG) algorithm (and other variants) have linear (geometric) convergence without the need of any regularization. In contrast to optimization approaches used in the literature, our approach is based on approximate dynamic programing. We then consider a linear function approximation variant of it and establish its convergence guarantees. Finally, we consider the RL setting where a critic is deployed for policy evaluation when the model is unknown. We consider a critic that is based on TD learning, and uses off-policy sampling with linear function approximation. This leads to the infamous deadly-triad issue. We propose a generic algorithm framework of a single time-scale multi-step TD-learning with generalized importance sampling ratios that enables us to overcome the high variance issue in off-policy learning, and establish its finite-sample guarantees. We show that this leads to an overall sample complexity of O(epsilon^-2).




Other Videos By Simons Institute for the Theory of Computing


2022-10-13New Results on Primal-Dual Algorithms for Online Allocation Problems With Applications to ...
2022-10-12Learning Across Bandits in High Dimension via Robust Statistics
2022-10-12Are Multicriteria MDPs Harder to Solve Than Single-Criteria MDPs?
2022-10-12A Game-Theoretic Approach to Offline Reinforcement Learning
2022-10-11The Statistical Complexity of Interactive Decision Making
2022-10-11A Tutorial on Finite-Sample Guarantees of Contractive Stochastic Approximation With...
2022-10-11A Tutorial on Finite-Sample Guarantees of Contractive Stochastic Approximation With...
2022-10-11Stochastic Bin Packing with Time-Varying Item Sizes
2022-10-10Constant Regret in Exchangeable Action Models: Overbooking, Bin Packing, and Beyond
2022-10-08On The Exploration In Load-Balancing Under Unknown Service Rates
2022-10-08Sample Complexity Of Policy-Based Methods Under Off-Policy Sampling And ...
2022-10-08The Compensated Coupling (or Why the Future is the Best Guide for the Present)
2022-10-08Higher-Dimensional Expansion of Random Geometric Complexes
2022-10-08On the Power of Preconditioning in Sparse Linear Regression
2022-10-07What Functions Do Transformers Prefer to Represent?
2022-10-01Optimality of Variational Inference for Stochastic Block Model
2022-10-01Machine Learning on Large-Scale Graphs
2022-10-01Survey on Sparse Graph Limits + A Toy Example
2022-10-01Long Range Dependence in Evolving Networks
2022-09-30Stochastic Processes on Sparse Graphs: Hydrodynamic Limits and Markov Approximations
2022-09-30Large Deviation Principle for the Norm of the Adjacency Matrix and the Laplacian Matrix of...



Tags:
Simons Institute
theoretical computer science
UC Berkeley
Computer Science
Theory of Computation
Theory of Computing
Joint IFML/Data-Driven Decision Processes Workshop
Siva Theja Maguluri