The Complexity of Infinite-Horizon General-Sum Stochastic Games: Turn-Based and Simultaneous Play

Published on ● Video Link: https://www.youtube.com/watch?v=HMhex09El-8



Duration: 53:41
505 views
2


Vidya Muthukumar (Georgia Institute of Technology)
https://simons.berkeley.edu/talks/complexity-infinite-horizon-general-sum-stochastic-games-turn-based-and-simultaneous-play
Multi-Agent Reinforcement Learning and Bandit Learning

A common variant of infinite-horizon stochastic games are turn-based, in the sense that only one agent can take an action at a state at a given time. We initiate an algorithmic study of such turn-based stochastic games (TBSG) with both discounted and averaged general-sum rewards, and ask whether it is possible to compute a Nash equilibrium (NE) of a TBSG in time that is polynomial in the number of states (S) and the number of actions per state (A). On the one hand, we show that non-stationary NE are computable in poly(S,A) time and stationary NE are computable in poly(A) time. On the other hand, we show that computing a multiplayer approximate stationary NE is PPAD-complete in S. Our results imply PPAD-hardness for computing stationary coarse correlated equilibria in general-sum simultaneous stochastic games. This constitutes a non-standard separation in complexity between zero-sum and general-sum settings for infinite-horizon stochastic games. Beyond these results, we provide a number of additional characterizations of related stochastic games. For example, we show that computing a stationary NE of an infinite-horizon simultaneous stochastic game is in PPAD despite the inherent non-convexity in utilities. We also identify some special cases of general-sum TBSG for which pure stationary NE always exist and are computable in poly(S,A) time. Underlying all of our results are new structural properties for stochastic games that may be of independent interest.




Other Videos By Simons Institute for the Theory of Computing


2022-05-05Kernelized Multiplicative Weights for 0/1-Polyhedral Games:...
2022-05-04Multi-Agent Reinforcement Learning Towards Zero-Shot Communication
2022-05-04Sequential Information Design: Markov Persuasion Process and Its Efficient Reinforcement Learning
2022-05-04Independent Learning in Stochastic Games
2022-05-04On Rewards in Multi-Agent Systems
2022-05-04Learning Automata as Building Blocks for MARL
2022-05-04Efficient Error Correction in Neutral Atoms via Erasure Conversion | Quantum Colloquium
2022-05-04Multi-Agent Reinforcement Learning in the High Population Regime
2022-05-04A Regret Minimization Approach to Mutli-Agent Control and RL
2022-05-03The Complexity of Markov Equilibrium in Stochastic Games
2022-05-03The Complexity of Infinite-Horizon General-Sum Stochastic Games: Turn-Based and Simultaneous Play
2022-05-03Policy Gradients in General-Sum Dynamic Games: When Do They Even Converge?
2022-05-03No-Regret Learning in Time-Varying Zero-Sum Games
2022-05-03What is the Statistical Complexity of Reinforcement Learning?
2022-05-03V-Learning: Simple, Efficient, Decentralized Algorithm for Multiagent RL
2022-05-02"Calibeating": Beating Forecasters at Their Own Game
2022-04-30Adjudicating Between Different Causal Accounts of Bell Inequality Violations
2022-04-30Why Born Probabilities?
2022-04-30Causal Discovery in the Quantum Context
2022-04-30"Fine-Tuned", "Unfaithful", "Unnatural": Abuse of Terminology in Causal Modeling
2022-04-30Causal Influence in Quantum Theory



Tags:
Simons Institute
theoretical computer science
UC Berkeley
Computer Science
Theory of Computation
Theory of Computing
Multi-Agent Reinforcement Learning and Bandit Learning
Vidya Muthukumar