Thompson Sampling in Combinatorial Multi-armed Bandits with Independent Arms

Channel:

Subscribers:

351,000

Published on December 12, 2022 8:34:55 PM ● Video Link: https://www.youtube.com/watch?v=KPabwwPxuMI

Duration: 24:37

615 views

2022 Data-driven Optimization Workshop: Thompson Sampling in Combinatorial Multi-armed Bandits with Independent Arms

Speaker: Siwei Wang, Microsoft Research Asia

Existing methods of combinatorial multi-armed bandits mainly focus on the UCB approach. To make the algorithm efficient, they usually use the sum of upper confidence bounds of base arms to represent the upper confidence bound of a super arm. However, when the outcomes of different base arms are independent, this upper confidence bound could be much larger than necessary, which leads to a much higher regret upper bound (in regret minimization problems) or complexity upper bound (in pure exploration problems). To deal with this challenge, we explore the idea of Thompson Sampling (TS) that uses independent random samples instead of the upper confidence bounds, and design TS-based algorithms for both the regret minimization problems and the pure exploration problems. In TS-based algorithms, the sum of independent random samples within a super arm will not exceed its tight upper confidence bound with high probability. Hence it solves the above challenge, and achieves lower regret/complexity upper bounds than existing efficient UCB-based algorithms.

Other Videos By Microsoft Research

2023-01-24	SmartKC: A Low-cost, Smartphone-based Corneal Topographer
2023-01-11	MSR-IISc AI Seminar Series: On Learning-Aware Mechanism Design - Michael I. Jordan
2022-12-22	Tongue-Gesture Recognition in Head-Mounted Displays
2022-12-15	Global Renewables Watch - AI for Good Lab - Geospatial
2022-12-15	Toward a Healthy Research Ecosystem for Large Language Models \| Panel Discussion
2022-12-14	Joint Pricing and Inventory Management with Demand Learning
2022-12-14	SITI 2022 - Panel Discussion and moderated Q&A session
2022-12-12	Machine Learning for Combinatorial Optimization: Some Empirical Studies
2022-12-12	Online Facility Location with Predictions
2022-12-12	Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits
2022-12-12	Thompson Sampling in Combinatorial Multi-armed Bandits with Independent Arms
2022-12-06	Personality Predictions from Automated Video Interviews: Explainable or Unexplainable Models?
2022-12-06	Responsible AI: An Interdisciplinary Approach \| Panel Discussion
2022-12-06	Personalizing Responsibility within AI Systems: A Case for Designing Diversity
2022-12-06	Evidence-based Evaluation for Responsible AI
2022-12-06	Towards Trustworthy Recommender Systems: From Shallow Models to Deep Models to Large Models
2022-12-06	Development of a Game-Based Assessment to Measure Creativity
2022-12-06	Interpretability, Responsibility and Controllability of Human Behaviors
2022-12-06	On the Adversarial Robustness of Deep Learning
2022-12-06	The Long March Towards AI Fairness
2022-12-06	Towards Human Value Based Natural Language Processing (NLP)

Channel	Latest
WolfeyVGC	7 hours ago
Fire Within Us	7 hours ago
Family Friendly Gaming	7 hours ago
JL Tomy - Live	7 hours ago
3p Venom	7 hours ago
MumboElite	7 hours ago
Yannex	7 hours ago
Six9 FF	7 hours ago
RkReddy	7 hours ago
SammyJam	7 hours ago
Aniket shivalkar	7 hours ago
Hero Wars Central	8 hours ago
Kinotechka	8 hours ago
Dav1	8 hours ago
obiiWan7	8 hours ago
Papai Toons	8 hours ago
Ritmo Cabarete Digital TV	8 hours ago
Bli Pur	8 hours ago
Jamesys Game Zone	8 hours ago
MrT-Gaming	8 hours ago
JhuAncz Channel	8 hours ago
One More Gameplay	8 hours ago
Tenma Ch. マエミ天満【Phase Connect】	8 hours ago
Strike Gold Daily	8 hours ago
PallySilverstar	8 hours ago