Dynamic Regret Minimization for Bandits without Prior Knowledge

Published on ● Video Link: https://www.youtube.com/watch?v=kLMVbPnD5kc



Duration: 46:00
1,510 views
11


Chen-Yu Wei (University of Southern California)
https://simons.berkeley.edu/talks/tbd-482
Quantifying Uncertainty: Stochastic, Adversarial, and Beyond

To evaluate the performance of a bandit learner in a changing environment, the standard notion of regret is insufficient. Instead, "dynamic regret" is a better measure that can evaluate the learner's ability to track the changes. How to achieve the optimal dynamic regret without prior knowledge on the number of times the environment changes had been open for a long time, and was recently resolved by Auer, Gajane, and Ortner in their COLT 2019 paper. We will discuss their consecutive sampling technique, which is rare in the bandit literature, and see how their idea can be elegantly generalized to a wide range of bandit/RL problems. Finally, we will discuss important open problems that remain in the area.







Tags:
Simons Institute
theoretical computer science
UC Berkeley
Computer Science
Theory of Computation
Theory of Computing
Quantifying Uncertainty: Stochastic Adversarial and Beyond
Chen-Yu Wei