A Game-Theoretic Approach to Offline Reinforcement Learning

Channel:

Simons Institute for the Theory of Computing

Subscribers:

68,700

Published on October 12, 2022 7:02:16 AM ● Video Link: https://www.youtube.com/watch?v=UhqDcHrWn3E

Duration: 57:26

794 views

Ching-An Cheng (Microsoft Research)
https://simons.berkeley.edu/talks/game-theoretic-approach-offline-reinforcement-learning
Structure of Constraints in Sequential Decision-Making

Offline reinforcement learning (RL) is a paradigm for designing agents that can learn from existing datasets. Because offline RL can learn policies without collecting new data or expensive expert demonstrations, it offers great potentials for solving real-world problems. However, offline RL faces a fundamental challenge: oftentimes data in real world can only be collected by policies meeting certain criteria (e.g., on performance, safety, or ethics). As a result, existing data, though being large, could lack diversity and have limited usefulness. In this talk, I will introduce a generic game-theoretic approach to offline RL. It frames offline RL as a two-player game where a learning agent competes with an adversary that simulates the uncertain decision outcomes due to missing data coverage. By this game analogy, I will present a systematic and provably correct framework to design offline RL algorithms that can learn good policies with state-of-the-art empirical performance. In addition, I will show that this framework reveals a natural connection between offline RL and imitation learning, which ensures the learned policies to be always no worse than the data collection policies regardless of hyperparameter choices.

Other Videos By Simons Institute for the Theory of Computing

2022-10-19	NLTS Hamiltonians from Codes \| Quantum Colloquium
2022-10-15	Learning to Control Safety-Critical Systems
2022-10-14	Near-Optimal No-Regret Learning for General Convex Games
2022-10-14	The Power of Adaptivity in Representation Learning: From Meta-Learning to Federated Learning
2022-10-14	When Matching Meets Batching: Optimal Multi-stage Algorithms and Applications
2022-10-13	Optimal Learning for Structured Bandits
2022-10-13	Dynamic Spatial Matching
2022-10-13	New Results on Primal-Dual Algorithms for Online Allocation Problems With Applications to ...
2022-10-12	Learning Across Bandits in High Dimension via Robust Statistics
2022-10-12	Are Multicriteria MDPs Harder to Solve Than Single-Criteria MDPs?
2022-10-12	A Game-Theoretic Approach to Offline Reinforcement Learning
2022-10-11	The Statistical Complexity of Interactive Decision Making
2022-10-11	A Tutorial on Finite-Sample Guarantees of Contractive Stochastic Approximation With...
2022-10-11	A Tutorial on Finite-Sample Guarantees of Contractive Stochastic Approximation With...
2022-10-11	Stochastic Bin Packing with Time-Varying Item Sizes
2022-10-10	Constant Regret in Exchangeable Action Models: Overbooking, Bin Packing, and Beyond
2022-10-08	On The Exploration In Load-Balancing Under Unknown Service Rates
2022-10-08	Sample Complexity Of Policy-Based Methods Under Off-Policy Sampling And ...
2022-10-08	The Compensated Coupling (or Why the Future is the Best Guide for the Present)
2022-10-08	Higher-Dimensional Expansion of Random Geometric Complexes
2022-10-08	On the Power of Preconditioning in Sparse Linear Regression

Tags:

Simons Institute

theoretical computer science

UC Berkeley

Computer Science

Theory of Computation

Theory of Computing

Structure of Constraints in Sequential Decision-Making

Ching-An Cheng

Channel	Latest
RoninRevil	6 hours ago
Wos	6 hours ago
MOMOKO YODA	6 hours ago
iGuti89	6 hours ago
Pierro_fps	6 hours ago
Dragomazing	7 hours ago
Sport Piceno Game	7 hours ago
Thích Violin	7 hours ago
Der Mikeintosh	7 hours ago
UltimateNyde	7 hours ago
Nexific	7 hours ago
KevGaming87	7 hours ago
Liban Ali	7 hours ago
Reborn Project	7 hours ago
Mokka Commentry	7 hours ago
CARBON	7 hours ago
SkyWhait	7 hours ago
Lostgamerrus	8 hours ago
Crouch Gaming	8 hours ago
RayThaGawd	8 hours ago
Schannel	8 hours ago
la cueva de lobo	8 hours ago
Geezax	8 hours ago
Nubo BIT	8 hours ago
Inter	8 hours ago