Optimal Learning for Structured Bandits

Published on ● Video Link: https://www.youtube.com/watch?v=NPPTcLcsb3k



Duration: 49:46
388 views
9


Negin Golrezei (Massachusetts Institute of Technology)
https://simons.berkeley.edu/talks/optimal-learning-structured-bandits
Structure of Constraints in Sequential Decision-Making

In this work, we study structured multi-armed bandits, which is the problem of online decision-making under uncertainty in the presence of structural information. In this problem, the decision-maker needs to discover the best course of action despite observing only uncertain rewards over time. The decision-maker is aware of certain structural information regarding the reward distributions and would like to minimize their regret by exploiting this information, where the regret is its performance difference against a benchmark policy that knows the best action ahead of time. In the absence of structural information, the classical upper confidence bound (UCB) and Thomson sampling algorithms are well known to suffer only minimal regret. As recently pointed out, neither algorithms are, however, capable of exploiting structural information that is commonly available in practice. We propose a novel learning algorithm that we call DUSA whose worst-case regret matches the information-theoretic regret lower bound up to a constant factor and can handle a wide range of structural information. Our algorithm DUSA solves a dual counterpart of the regret lower bound at the empirical reward distribution and follows its suggested play. Our proposed algorithm is the first computationally viable learning policy for structured bandit problems that has asymptotic minimal regret.




Other Videos By Simons Institute for the Theory of Computing


2022-10-25Functional Law of Large Numbers and PDEs for Spatial Epidemic Models with...
2022-10-25Algorithms Using Local Graph Features to Predict Epidemics
2022-10-24Epidemic Models with Manual and Digital Contact Tracing
2022-10-21Pandora’s Box: Learning to Leverage Costly Information
2022-10-20Thresholds
2022-10-19NLTS Hamiltonians from Codes | Quantum Colloquium
2022-10-15Learning to Control Safety-Critical Systems
2022-10-14Near-Optimal No-Regret Learning for General Convex Games
2022-10-14The Power of Adaptivity in Representation Learning: From Meta-Learning to Federated Learning
2022-10-14When Matching Meets Batching: Optimal Multi-stage Algorithms and Applications
2022-10-13Optimal Learning for Structured Bandits
2022-10-13Dynamic Spatial Matching
2022-10-13New Results on Primal-Dual Algorithms for Online Allocation Problems With Applications to ...
2022-10-12Learning Across Bandits in High Dimension via Robust Statistics
2022-10-12Are Multicriteria MDPs Harder to Solve Than Single-Criteria MDPs?
2022-10-12A Game-Theoretic Approach to Offline Reinforcement Learning
2022-10-11The Statistical Complexity of Interactive Decision Making
2022-10-11A Tutorial on Finite-Sample Guarantees of Contractive Stochastic Approximation With...
2022-10-11A Tutorial on Finite-Sample Guarantees of Contractive Stochastic Approximation With...
2022-10-11Stochastic Bin Packing with Time-Varying Item Sizes
2022-10-10Constant Regret in Exchangeable Action Models: Overbooking, Bin Packing, and Beyond



Tags:
Simons Institute
theoretical computer science
UC Berkeley
Computer Science
Theory of Computation
Theory of Computing
Structure of Constraints in Sequential Decision-Making
Negin Golrezei