Learning Across Bandits in High Dimension via Robust Statistics

Channel:

Simons Institute for the Theory of Computing

Subscribers:

68,700

Published on October 12, 2022 11:20:22 AM ● Video Link: https://www.youtube.com/watch?v=aHKZUuEOrl4

Duration: 47:31

710 views

Hamsa Bastani (University of Pennsylvania)
https://simons.berkeley.edu/talks/learning-across-bandits-high-dimension-robust-statistics
Structure of Constraints in Sequential Decision-Making

Decision-makers often face the "many bandits" problem, where one must simultaneously learn across related but heterogeneous contextual bandit instances. For instance, a large retailer may wish to dynamically learn product demand across many stores to solve pricing or inventory problems, making it desirable to learn jointly for stores serving similar customers; alternatively, a hospital network may wish to dynamically learn patient risk across many providers to allocate personalized interventions, making it desirable to learn jointly for hospitals serving similar patient populations. Motivated by real datasets, we decompose the unknown parameter in each bandit instance into a global parameter plus a sparse instance-specific term. Then, we propose a novel two-stage estimator that exploits this structure in a sample-efficient way by using a combination of robust statistics (to learn across similar instances) and LASSO regression (to debias the results). We embed this estimator within a bandit algorithm, and prove that it improves asymptotic regret bounds in the context dimension; this improvement is exponential for data-poor instances. We further demonstrate how our results depend on the underlying network structure of bandit instances. Finally, we illustrate the value of our approach on synthetic and real datasets. Joint work with Kan Xu. Paper: https://arxiv.org/abs/2112.14233

Other Videos By Simons Institute for the Theory of Computing

2022-10-21	Pandora’s Box: Learning to Leverage Costly Information
2022-10-20	Thresholds
2022-10-19	NLTS Hamiltonians from Codes \| Quantum Colloquium
2022-10-15	Learning to Control Safety-Critical Systems
2022-10-14	Near-Optimal No-Regret Learning for General Convex Games
2022-10-14	The Power of Adaptivity in Representation Learning: From Meta-Learning to Federated Learning
2022-10-14	When Matching Meets Batching: Optimal Multi-stage Algorithms and Applications
2022-10-13	Optimal Learning for Structured Bandits
2022-10-13	Dynamic Spatial Matching
2022-10-13	New Results on Primal-Dual Algorithms for Online Allocation Problems With Applications to ...
2022-10-12	Learning Across Bandits in High Dimension via Robust Statistics
2022-10-12	Are Multicriteria MDPs Harder to Solve Than Single-Criteria MDPs?
2022-10-12	A Game-Theoretic Approach to Offline Reinforcement Learning
2022-10-11	The Statistical Complexity of Interactive Decision Making
2022-10-11	A Tutorial on Finite-Sample Guarantees of Contractive Stochastic Approximation With...
2022-10-11	A Tutorial on Finite-Sample Guarantees of Contractive Stochastic Approximation With...
2022-10-11	Stochastic Bin Packing with Time-Varying Item Sizes
2022-10-10	Constant Regret in Exchangeable Action Models: Overbooking, Bin Packing, and Beyond
2022-10-08	On The Exploration In Load-Balancing Under Unknown Service Rates
2022-10-08	Sample Complexity Of Policy-Based Methods Under Off-Policy Sampling And ...
2022-10-08	The Compensated Coupling (or Why the Future is the Best Guide for the Present)

Tags:

Simons Institute

theoretical computer science

UC Berkeley

Computer Science

Theory of Computation

Theory of Computing

Structure of Constraints in Sequential Decision-Making

Hamsa Bastani

Channel	Latest
RoninRevil	6 hours ago
Wos	6 hours ago
MOMOKO YODA	6 hours ago
iGuti89	6 hours ago
Pierro_fps	6 hours ago
Dragomazing	7 hours ago
Sport Piceno Game	7 hours ago
Thích Violin	7 hours ago
Der Mikeintosh	7 hours ago
UltimateNyde	7 hours ago
Nexific	7 hours ago
KevGaming87	7 hours ago
Liban Ali	7 hours ago
Reborn Project	7 hours ago
Mokka Commentry	7 hours ago
CARBON	7 hours ago
SkyWhait	7 hours ago
Lostgamerrus	8 hours ago
Crouch Gaming	8 hours ago
RayThaGawd	8 hours ago
Schannel	8 hours ago
la cueva de lobo	8 hours ago
Geezax	8 hours ago
Nubo BIT	8 hours ago
Inter	8 hours ago