Policy Gradients in General-Sum Dynamic Games: When Do They Even Converge?

Published on ● Video Link: https://www.youtube.com/watch?v=pWVK2DZRJpM



Duration: 28:00
391 views
8


Eric Mazumdar (Caltech)
https://simons.berkeley.edu/talks/policy-gradients-general-sum-dynamic-games-when-do-they-even-converge
Multi-Agent Reinforcement Learning and Bandit Learning

In this talk I will present work showing that agents using simple policy gradient algorithms in arguably the simplest class of continuous action- and state-space multi-agent control problem: general-sum linear quadratic games, have no guarantees of asymptotic convergence, and that proximal point and extra-gradients will not solve these issues. I will then focus in on zero-sum LQ games in which stronger convergence guarantees are possible when agents use independent policy gradients with a finite timescale separation.




Other Videos By Simons Institute for the Theory of Computing


2022-05-04Multi-Agent Reinforcement Learning Towards Zero-Shot Communication
2022-05-04Sequential Information Design: Markov Persuasion Process and Its Efficient Reinforcement Learning
2022-05-04Independent Learning in Stochastic Games
2022-05-04On Rewards in Multi-Agent Systems
2022-05-04Learning Automata as Building Blocks for MARL
2022-05-04Efficient Error Correction in Neutral Atoms via Erasure Conversion | Quantum Colloquium
2022-05-04Multi-Agent Reinforcement Learning in the High Population Regime
2022-05-04A Regret Minimization Approach to Mutli-Agent Control and RL
2022-05-03The Complexity of Markov Equilibrium in Stochastic Games
2022-05-03The Complexity of Infinite-Horizon General-Sum Stochastic Games: Turn-Based and Simultaneous Play
2022-05-03Policy Gradients in General-Sum Dynamic Games: When Do They Even Converge?
2022-05-03No-Regret Learning in Time-Varying Zero-Sum Games
2022-05-03What is the Statistical Complexity of Reinforcement Learning?
2022-05-03V-Learning: Simple, Efficient, Decentralized Algorithm for Multiagent RL
2022-05-02"Calibeating": Beating Forecasters at Their Own Game
2022-04-30Adjudicating Between Different Causal Accounts of Bell Inequality Violations
2022-04-30Why Born Probabilities?
2022-04-30Causal Discovery in the Quantum Context
2022-04-30"Fine-Tuned", "Unfaithful", "Unnatural": Abuse of Terminology in Causal Modeling
2022-04-30Causal Influence in Quantum Theory
2022-04-29A Dynamic-Epistemic Approach to Conditionals



Tags:
Simons Institute
theoretical computer science
UC Berkeley
Computer Science
Theory of Computation
Theory of Computing
Multi-Agent Reinforcement Learning and Bandit Learning
Eric Mazumdar