Learning Across Bandits in High Dimension via Robust Statistics

Published on ● Video Link: https://www.youtube.com/watch?v=aHKZUuEOrl4



Duration: 47:31
710 views
19


Hamsa Bastani (University of Pennsylvania)
https://simons.berkeley.edu/talks/learning-across-bandits-high-dimension-robust-statistics
Structure of Constraints in Sequential Decision-Making

Decision-makers often face the "many bandits" problem, where one must simultaneously learn across related but heterogeneous contextual bandit instances. For instance, a large retailer may wish to dynamically learn product demand across many stores to solve pricing or inventory problems, making it desirable to learn jointly for stores serving similar customers; alternatively, a hospital network may wish to dynamically learn patient risk across many providers to allocate personalized interventions, making it desirable to learn jointly for hospitals serving similar patient populations. Motivated by real datasets, we decompose the unknown parameter in each bandit instance into a global parameter plus a sparse instance-specific term. Then, we propose a novel two-stage estimator that exploits this structure in a sample-efficient way by using a combination of robust statistics (to learn across similar instances) and LASSO regression (to debias the results). We embed this estimator within a bandit algorithm, and prove that it improves asymptotic regret bounds in the context dimension; this improvement is exponential for data-poor instances. We further demonstrate how our results depend on the underlying network structure of bandit instances. Finally, we illustrate the value of our approach on synthetic and real datasets. Joint work with Kan Xu. Paper: https://arxiv.org/abs/2112.14233




Other Videos By Simons Institute for the Theory of Computing


2022-10-21Pandora’s Box: Learning to Leverage Costly Information
2022-10-20Thresholds
2022-10-19NLTS Hamiltonians from Codes | Quantum Colloquium
2022-10-15Learning to Control Safety-Critical Systems
2022-10-14Near-Optimal No-Regret Learning for General Convex Games
2022-10-14The Power of Adaptivity in Representation Learning: From Meta-Learning to Federated Learning
2022-10-14When Matching Meets Batching: Optimal Multi-stage Algorithms and Applications
2022-10-13Optimal Learning for Structured Bandits
2022-10-13Dynamic Spatial Matching
2022-10-13New Results on Primal-Dual Algorithms for Online Allocation Problems With Applications to ...
2022-10-12Learning Across Bandits in High Dimension via Robust Statistics
2022-10-12Are Multicriteria MDPs Harder to Solve Than Single-Criteria MDPs?
2022-10-12A Game-Theoretic Approach to Offline Reinforcement Learning
2022-10-11The Statistical Complexity of Interactive Decision Making
2022-10-11A Tutorial on Finite-Sample Guarantees of Contractive Stochastic Approximation With...
2022-10-11A Tutorial on Finite-Sample Guarantees of Contractive Stochastic Approximation With...
2022-10-11Stochastic Bin Packing with Time-Varying Item Sizes
2022-10-10Constant Regret in Exchangeable Action Models: Overbooking, Bin Packing, and Beyond
2022-10-08On The Exploration In Load-Balancing Under Unknown Service Rates
2022-10-08Sample Complexity Of Policy-Based Methods Under Off-Policy Sampling And ...
2022-10-08The Compensated Coupling (or Why the Future is the Best Guide for the Present)



Tags:
Simons Institute
theoretical computer science
UC Berkeley
Computer Science
Theory of Computation
Theory of Computing
Structure of Constraints in Sequential Decision-Making
Hamsa Bastani