Player of Games: All the games, one algorithm! (w/ author Martin Schmid)

Subscribers:
286,000
Published on ● Video Link: https://www.youtube.com/watch?v=U0mxx7AoNz0



Category:
Guide
Duration: 54:11
17,432 views
575


#playerofgames #deepmind #alphazero

Special Guest: First author Martin Schmid (https://twitter.com/Lifrordi)
Games have been used throughout research as testbeds for AI algorithms, such as reinforcement learning agents. However, different types of games usually require different solution approaches, such as AlphaZero for Go or Chess, and Counterfactual Regret Minimization (CFR) for Poker. Player of Games bridges this gap between perfect and imperfect information games and delivers a single algorithm that uses tree search over public information states, and is trained via self-play. The resulting algorithm can play Go, Chess, Poker, Scotland Yard, and many more games, as well as non-game environments.

OUTLINE:
0:00 - Introduction
2:50 - What games can Player of Games be trained on?
4:00 - Tree search algorithms (AlphaZero)
8:00 - What is different in imperfect information games?
15:40 - Counterfactual Value- and Policy-Networks
18:50 - The Player of Games search procedure
28:30 - How to train the network?
34:40 - Experimental Results
47:20 - Discussion & Outlook

Paper: https://arxiv.org/abs/2112.03178

Abstract:
Games have a long history of serving as a benchmark for progress in artificial intelligence. Recently, approaches using search and learning have shown strong performance across a set of perfect information games, and approaches using game-theoretic reasoning and learning have shown strong performance for specific imperfect information poker variants. We introduce Player of Games, a general-purpose algorithm that unifies previous approaches, combining guided search, self-play learning, and game-theoretic reasoning. Player of Games is the first algorithm to achieve strong empirical performance in large perfect and imperfect information games -- an important step towards truly general algorithms for arbitrary environments. We prove that Player of Games is sound, converging to perfect play as available computation time and approximation capacity increases. Player of Games reaches strong performance in chess and Go, beats the strongest openly available agent in heads-up no-limit Texas hold'em poker (Slumbot), and defeats the state-of-the-art agent in Scotland Yard, an imperfect information game that illustrates the value of guided search, learning, and game-theoretic reasoning.

Authors: Martin Schmid, Matej Moravcik, Neil Burch, Rudolf Kadlec, Josh Davidson, Kevin Waugh, Nolan Bard, Finbarr Timbers, Marc Lanctot, Zach Holland, Elnaz Davoodi, Alden Christianson, Michael Bowling

Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
LinkedIn: https://www.linkedin.com/in/ykilcher
BiliBili: https://space.bilibili.com/2017636191

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n




Other Videos By Yannic Kilcher


2022-02-07OpenAI Embeddings (and Controversy?!)
2022-02-06Unsupervised Brain Models - How does Deep Learning inform Neuroscience? (w/ Patrick Mineault)
2022-02-04GPT-NeoX-20B - Open-Source huge language model by EleutherAI (Interview w/ co-founder Connor Leahy)
2022-01-29Predicting the rules behind - Deep Symbolic Regression for Recurrent Sequences (w/ author interview)
2022-01-27IT ARRIVED! YouTube sent me a package. (also: Limited Time Merch Deal)
2022-01-25[ML News] ConvNeXt: Convolutions return | China regulates algorithms | Saliency cropping examined
2022-01-21Dynamic Inference with Neural Interpreters (w/ author interview)
2022-01-19Noether Networks: Meta-Learning Useful Conserved Quantities (w/ the authors)
2022-01-11This Team won the Minecraft RL BASALT Challenge! (Paper Explanation & Interview with the authors)
2022-01-05Full Self-Driving is HARD! Analyzing Elon Musk re: Tesla Autopilot on Lex Fridman's Podcast
2022-01-02Player of Games: All the games, one algorithm! (w/ author Martin Schmid)
2021-12-30ML News Live! (Dec 30, 2021) Anonymous user RIPS Tensorflw | AI prosecutors rising | Penny Challenge
2021-12-28GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
2021-12-27Machine Learning Holidays Live Stream
2021-12-26Machine Learning Holiday Live Stream
2021-12-24[ML News] AI learns to search the Internet | Drawings come to life | New ML journal launches
2021-12-21[ML News] DeepMind builds Gopher | Google builds GLaM | Suicide capsule uses AI to check access
2021-11-27Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions (Paper Explained)
2021-11-25Peer Review is still BROKEN! The NeurIPS 2021 Review Experiment (results are in)
2021-11-24Parameter Prediction for Unseen Deep Architectures (w/ First Author Boris Knyazev)
2021-11-20Learning Rate Grafting: Transferability of Optimizer Tuning (Machine Learning Research Paper Review)



Tags:
deep learning
machine learning
arxiv
explained
neural networks
ai
artificial intelligence
paper
reinforcement learning
ai for go
ai go
ai chess
chess ai
stockfish
alphazero
alpha zero
muzero
player of games
pog
deepmind
deepmind games
imperfect information games
ai for poker
perfect vs imperfect information
public state
scotland yard
ai for scotland yard
reinforcement learning poker
ai no limit holdem
counterfactual regret minimization
tree search