Multi-Agent Reinforcement Learning in the High Population Regime

Published on ● Video Link: https://www.youtube.com/watch?v=aEpPkgftujU



Duration: 52:30
575 views
8


Tamer Başar (University of Illinois Urbana-Champaign)
https://simons.berkeley.edu/talks/multi-agent-reinforcement-learning-high-population-regime
Multi-Agent Reinforcement Learning and Bandit Learning

I will discuss some recent results on learning approximate Nash equilibrium policies in nonzero-sum stochastic dynamic games using the framework of mean-field games (MFGs). Following a general introduction, I will focus, for concrete results, on the structured setting of discrete-time infinite-horizon linear-quadratic-Gaussian dynamic games, where the players (agents) are partitioned into finitely-many populations connected by a network of known structure. Each population has a high number of agents, which are indistinguishable, but there is no indistinguishability across different populations. It is possible to characterize the Nash equilibrium (NE) of the game when the number of agents in each population goes to infinity, the so-called mean-field equilibrium (MFE), with local state information for each agent (thus making scalability not an issue), which can then be shown to lead to an approximate NE when the population sizes are finite, with a precise quantification of the approximation as a function of population sizes. The main focus of the talk, however, will be the model-free versions of such games, for which I will introduce a learning algorithm, based on zero-order stochastic optimization, for computation of the MFE, along with guaranteed convergence. The algorithm exploits the affine structure of both the equilibrium controller (for each population) and the equilibrium MF trajectory by decomposing the learning task into learning first the linear terms, and then the affine terms. One can also obtain a finite-sample bound quantifying the estimation error as a function of the number of samples. The talk will conclude with discussion of some extensions of the setting and future research directions.




Other Videos By Simons Institute for the Theory of Computing


2022-05-05Variants and Invariants in No-Regret Algorithms
2022-05-05When Is Offline Two-Player Zero-Sum Markov Game Solvable?
2022-05-05General Game-Theoretic Multiagent Reinforcement Learning
2022-05-05Kernelized Multiplicative Weights for 0/1-Polyhedral Games:...
2022-05-04Multi-Agent Reinforcement Learning Towards Zero-Shot Communication
2022-05-04Sequential Information Design: Markov Persuasion Process and Its Efficient Reinforcement Learning
2022-05-04Independent Learning in Stochastic Games
2022-05-04On Rewards in Multi-Agent Systems
2022-05-04Learning Automata as Building Blocks for MARL
2022-05-04Efficient Error Correction in Neutral Atoms via Erasure Conversion | Quantum Colloquium
2022-05-04Multi-Agent Reinforcement Learning in the High Population Regime
2022-05-04A Regret Minimization Approach to Mutli-Agent Control and RL
2022-05-03The Complexity of Markov Equilibrium in Stochastic Games
2022-05-03The Complexity of Infinite-Horizon General-Sum Stochastic Games: Turn-Based and Simultaneous Play
2022-05-03Policy Gradients in General-Sum Dynamic Games: When Do They Even Converge?
2022-05-03No-Regret Learning in Time-Varying Zero-Sum Games
2022-05-03What is the Statistical Complexity of Reinforcement Learning?
2022-05-03V-Learning: Simple, Efficient, Decentralized Algorithm for Multiagent RL
2022-05-02"Calibeating": Beating Forecasters at Their Own Game
2022-04-30Adjudicating Between Different Causal Accounts of Bell Inequality Violations
2022-04-30Why Born Probabilities?



Tags:
Simons Institute
theoretical computer science
UC Berkeley
Computer Science
Theory of Computation
Theory of Computing
Multi-Agent Reinforcement Learning and Bandit Learning
Tamer Başar