No-Regret Learning in Time-Varying Zero-Sum Games

Published on ● Video Link: https://www.youtube.com/watch?v=cNyjGj9R2xE



Duration: 35:35
508 views
12


Haipeng Luo (University of Southern California)
https://simons.berkeley.edu/talks/no-regret-learning-time-varying-zero-sum-games
Multi-Agent Reinforcement Learning and Bandit Learning

Learning from repeated play in a fixed two-player zero-sum game is a classic problem in game theory and online learning. This talk focuses on a natural yet underexplored variant of this problem where the game payoff matrix changes over time, possibly in an adversarial manner. In the first part of the talk, I will discuss what the appropriate performance measures are for this problem (and argue that some measures from prior works might be unreasonable). In the second part of the talk, I will present a new parameter-free algorithm that simultaneously enjoys favorable guarantees under three different performance measures. These guarantees are adaptive to different non-stationarity measures of the payoff matrices and, importantly, recover the best known results when the payoff matrix is fixed. I will conclude the talk by discussing some future directions on the extensions to multi-player bandits and reinforcement learning.




Other Videos By Simons Institute for the Theory of Computing


2022-05-04Sequential Information Design: Markov Persuasion Process and Its Efficient Reinforcement Learning
2022-05-04Independent Learning in Stochastic Games
2022-05-04On Rewards in Multi-Agent Systems
2022-05-04Learning Automata as Building Blocks for MARL
2022-05-04Efficient Error Correction in Neutral Atoms via Erasure Conversion | Quantum Colloquium
2022-05-04Multi-Agent Reinforcement Learning in the High Population Regime
2022-05-04A Regret Minimization Approach to Mutli-Agent Control and RL
2022-05-03The Complexity of Markov Equilibrium in Stochastic Games
2022-05-03The Complexity of Infinite-Horizon General-Sum Stochastic Games: Turn-Based and Simultaneous Play
2022-05-03Policy Gradients in General-Sum Dynamic Games: When Do They Even Converge?
2022-05-03No-Regret Learning in Time-Varying Zero-Sum Games
2022-05-03What is the Statistical Complexity of Reinforcement Learning?
2022-05-03V-Learning: Simple, Efficient, Decentralized Algorithm for Multiagent RL
2022-05-02"Calibeating": Beating Forecasters at Their Own Game
2022-04-30Adjudicating Between Different Causal Accounts of Bell Inequality Violations
2022-04-30Why Born Probabilities?
2022-04-30Causal Discovery in the Quantum Context
2022-04-30"Fine-Tuned", "Unfaithful", "Unnatural": Abuse of Terminology in Causal Modeling
2022-04-30Causal Influence in Quantum Theory
2022-04-29A Dynamic-Epistemic Approach to Conditionals
2022-04-29A Bayesian Probability Calculus for Density Matrices



Tags:
Simons Institute
theoretical computer science
UC Berkeley
Computer Science
Theory of Computation
Theory of Computing
Multi-Agent Reinforcement Learning and Bandit Learning
Haipeng Luo