Global Convergence of Multi-Agent Policy Gradient in Markov Potential Games

Channel:

Simons Institute for the Theory of Computing

Subscribers:

68,700

Published on May 6, 2022 5:08:11 AM ● Video Link: https://www.youtube.com/watch?v=56nqcUNhm-g

Duration: 53:35

594 views

Ioannis Panageas (UC Irvine)
https://simons.berkeley.edu/talks/tbd-399
Multi-Agent Reinforcement Learning and Bandit Learning

Potential games are arguably one of the most important and widely studied classes of normal form games. They define the archetypal setting of multi-agent coordination as all agent utilities are perfectly aligned with each other via a common potential function. Can this intuitive framework be transplanted in the setting of Markov Games? What are the similarities and differences between multi-agent coordination with and without state dependence? We present a novel definition of Markov Potential Games (MPG) that generalizes prior attempts at capturing complex stateful multi-agent coordination. Counter-intuitively, insights from normal-form potential games do not carry over as MPGs can consist of settings where state-games can be zero-sum games. In the opposite direction, Markov games where every state-game is a potential game are not necessarily MPGs. Nevertheless, MPGs showcase standard desirable properties such as the existence of deterministic Nash policies. In our main technical result, we prove fast convergence of independent policy gradient (and its stochastic variant) to Nash policies by adapting recent gradient dominance property arguments developed for single agent MDPs to multi-agent learning settings.

Other Videos By Simons Institute for the Theory of Computing

2022-05-26	Realizing the Promise of Neuromorphic Computing
2022-05-26	Fully Homomorphic Encryption, 10 Years Later: Definitions and Open Problems
2022-05-26	Max-Flow and Friends, In Almost Linear Time
2022-05-25	Modeling Conflict in Social Media
2022-05-25	Welcome
2022-05-11	Black Holes and the Quantum-Extended Church-Turing Thesis \| Quantum Colloquium
2022-05-06	The Role of Conventions in Adaptive Human-AI Interaction
2022-05-06	Learning Decentralized Policies in Multiagent Systems: How to Learn Efficiently and ...
2022-05-06	No-Regret Learning in Extensive-Form Games
2022-05-06	Learning and Equilibrium Refinements
2022-05-05	Global Convergence of Multi-Agent Policy Gradient in Markov Potential Games
2022-05-05	Multi-Player Bandits With No Collisions
2022-05-05	What Does Machine Learning Offer Game Theory (And Vice Versa)?
2022-05-05	Variants and Invariants in No-Regret Algorithms
2022-05-05	When Is Offline Two-Player Zero-Sum Markov Game Solvable?
2022-05-05	General Game-Theoretic Multiagent Reinforcement Learning
2022-05-05	Kernelized Multiplicative Weights for 0/1-Polyhedral Games:...
2022-05-04	Multi-Agent Reinforcement Learning Towards Zero-Shot Communication
2022-05-04	Sequential Information Design: Markov Persuasion Process and Its Efficient Reinforcement Learning
2022-05-04	Independent Learning in Stochastic Games
2022-05-04	On Rewards in Multi-Agent Systems

Tags:

Simons Institute

theoretical computer science

UC Berkeley

Computer Science

Theory of Computation

Theory of Computing

Multi-Agent Reinforcement Learning and Bandit Learning

Ioannis Panageas

Channel	Latest
RoninRevil	6 hours ago
Wos	6 hours ago
MOMOKO YODA	6 hours ago
iGuti89	6 hours ago
Pierro_fps	6 hours ago
Dragomazing	7 hours ago
Sport Piceno Game	7 hours ago
Thích Violin	7 hours ago
Der Mikeintosh	7 hours ago
UltimateNyde	7 hours ago
Nexific	7 hours ago
KevGaming87	7 hours ago
Liban Ali	7 hours ago
Reborn Project	7 hours ago
Mokka Commentry	7 hours ago
CARBON	7 hours ago
SkyWhait	7 hours ago
Lostgamerrus	8 hours ago
Crouch Gaming	8 hours ago
RayThaGawd	8 hours ago
Schannel	8 hours ago
la cueva de lobo	8 hours ago
Geezax	8 hours ago
Nubo BIT	8 hours ago
Inter	8 hours ago