Thompson Sampling in Combinatorial Multi-armed Bandits with Independent Arms

Subscribers:
343,000
Published on ● Video Link: https://www.youtube.com/watch?v=KPabwwPxuMI



Duration: 24:37
615 views
6


2022 Data-driven Optimization Workshop: Thompson Sampling in Combinatorial Multi-armed Bandits with Independent Arms

Speaker: Siwei Wang, Microsoft Research Asia

Existing methods of combinatorial multi-armed bandits mainly focus on the UCB approach. To make the algorithm efficient, they usually use the sum of upper confidence bounds of base arms to represent the upper confidence bound of a super arm. However, when the outcomes of different base arms are independent, this upper confidence bound could be much larger than necessary, which leads to a much higher regret upper bound (in regret minimization problems) or complexity upper bound (in pure exploration problems). To deal with this challenge, we explore the idea of Thompson Sampling (TS) that uses independent random samples instead of the upper confidence bounds, and design TS-based algorithms for both the regret minimization problems and the pure exploration problems. In TS-based algorithms, the sum of independent random samples within a super arm will not exceed its tight upper confidence bound with high probability. Hence it solves the above challenge, and achieves lower regret/complexity upper bounds than existing efficient UCB-based algorithms.




Other Videos By Microsoft Research


2023-01-24SmartKC: A Low-cost, Smartphone-based Corneal Topographer
2023-01-11MSR-IISc AI Seminar Series: On Learning-Aware Mechanism Design - Michael I. Jordan
2022-12-22Tongue-Gesture Recognition in Head-Mounted Displays
2022-12-15Global Renewables Watch - AI for Good Lab - Geospatial
2022-12-15Toward a Healthy Research Ecosystem for Large Language Models | Panel Discussion
2022-12-14Joint Pricing and Inventory Management with Demand Learning
2022-12-14SITI 2022 - Panel Discussion and moderated Q&A session
2022-12-12Machine Learning for Combinatorial Optimization: Some Empirical Studies
2022-12-12Online Facility Location with Predictions
2022-12-12Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits
2022-12-12Thompson Sampling in Combinatorial Multi-armed Bandits with Independent Arms
2022-12-06Personality Predictions from Automated Video Interviews: Explainable or Unexplainable Models?
2022-12-06Responsible AI: An Interdisciplinary Approach | Panel Discussion
2022-12-06Personalizing Responsibility within AI Systems: A Case for Designing Diversity
2022-12-06Evidence-based Evaluation for Responsible AI
2022-12-06Towards Trustworthy Recommender Systems: From Shallow Models to Deep Models to Large Models
2022-12-06Development of a Game-Based Assessment to Measure Creativity
2022-12-06Interpretability, Responsibility and Controllability of Human Behaviors
2022-12-06On the Adversarial Robustness of Deep Learning
2022-12-06The Long March Towards AI Fairness
2022-12-06Towards Human Value Based Natural Language Processing (NLP)