The Complexity of Markov Equilibrium in Stochastic Games

Published on ● Video Link: https://www.youtube.com/watch?v=NnY7lLUEkqY



Duration: 54:35
898 views
18


Noah Golowich (MIT)
https://simons.berkeley.edu/talks/complexity-markov-equilibrium-stochastic-games
Multi-Agent Reinforcement Learning and Bandit Learning

We show that computing approximate stationary Markov coarse correlated equilibria (CCE) in general-sum stochastic games is computationally intractable, even when there are two players, the game is turn-based, the discount factor is an absolute constant, and the approximation is an absolute constant. Our intractability results stand in sharp contrast to normal-form games where exact CCEs are efficiently computable. A fortiori, our results imply that there are no efficient algorithms for learning stationary Markov CCE policies in multi-agent reinforcement learning (MARL), even when the interaction is two-player and turn-based, and both the discount factor and the desired approximation of the learned policies is an absolute constant. In turn, these results stand in sharp contrast to single-agent reinforcement learning (RL) where near-optimal stationary Markov policies can be efficiently learned. Complementing our intractability results for stationary Markov CCEs, we provide a decentralized algorithm (assuming shared randomness among players) for learning a nonstationary Markov CCE policy with polynomial time and sample complexity in all problem parameters. Previous work for learning Markov CCE policies all required exponential time and sample complexity in the number of players. Joint work with Costis Daskalakis and Kaiqing Zhang.




Other Videos By Simons Institute for the Theory of Computing


2022-05-05General Game-Theoretic Multiagent Reinforcement Learning
2022-05-05Kernelized Multiplicative Weights for 0/1-Polyhedral Games:...
2022-05-04Multi-Agent Reinforcement Learning Towards Zero-Shot Communication
2022-05-04Sequential Information Design: Markov Persuasion Process and Its Efficient Reinforcement Learning
2022-05-04Independent Learning in Stochastic Games
2022-05-04On Rewards in Multi-Agent Systems
2022-05-04Learning Automata as Building Blocks for MARL
2022-05-04Efficient Error Correction in Neutral Atoms via Erasure Conversion | Quantum Colloquium
2022-05-04Multi-Agent Reinforcement Learning in the High Population Regime
2022-05-04A Regret Minimization Approach to Mutli-Agent Control and RL
2022-05-03The Complexity of Markov Equilibrium in Stochastic Games
2022-05-03The Complexity of Infinite-Horizon General-Sum Stochastic Games: Turn-Based and Simultaneous Play
2022-05-03Policy Gradients in General-Sum Dynamic Games: When Do They Even Converge?
2022-05-03No-Regret Learning in Time-Varying Zero-Sum Games
2022-05-03What is the Statistical Complexity of Reinforcement Learning?
2022-05-03V-Learning: Simple, Efficient, Decentralized Algorithm for Multiagent RL
2022-05-02"Calibeating": Beating Forecasters at Their Own Game
2022-04-30Adjudicating Between Different Causal Accounts of Bell Inequality Violations
2022-04-30Why Born Probabilities?
2022-04-30Causal Discovery in the Quantum Context
2022-04-30"Fine-Tuned", "Unfaithful", "Unnatural": Abuse of Terminology in Causal Modeling



Tags:
Simons Institute
theoretical computer science
UC Berkeley
Computer Science
Theory of Computation
Theory of Computing
Multi-Agent Reinforcement Learning and Bandit Learning
Noah Golowich