ReBeL - Combining Deep Reinforcement Learning and Search for Imperfect-Information Games (Explained)
#ai #technology #poker
This paper does for Poker what AlphaZero has done for Chess & Go. The combination of Self-Play Reinforcement Learning and Tree Search has had tremendous success in perfect-information games, but transferring such techniques to imperfect information games is a hard problem. Not only does ReBeL solve this problem, but it provably converges to a Nash Equilibrium and delivers a superhuman Heads Up No-Limit Hold'em bot with very little domain knowledge.
OUTLINE:
0:00 - Intro & Overview
3:20 - Rock, Paper, and Double Scissor
10:00 - AlphaZero Tree Search
18:30 - Notation Setup: Infostates & Nash Equilibria
31:45 - One Card Poker: Introducing Belief Representations
45:00 - Solving Games in Belief Representation
55:20 - The ReBeL Algorithm
1:04:00 - Theory & Experiment Results
1:07:00 - Broader Impact
1:10:20 - High-Level Summary
Paper: https://arxiv.org/abs/2007.13544
Code: https://github.com/facebookresearch/rebel
Blog: https://ai.facebook.com/blog/rebel-a-general-game-playing-ai-bot-that-excels-at-poker-and-more/
ERRATA: As someone last video pointed out: This is not the best Poker algorithm, but the best one that uses very little expert knowledge.
Abstract:
The combination of deep reinforcement learning and search at both training and test time is a powerful paradigm that has led to a number of successes in single-agent settings and perfect-information games, best exemplified by AlphaZero. However, prior algorithms of this form cannot cope with imperfect-information games. This paper presents ReBeL, a general framework for self-play reinforcement learning and search that provably converges to a Nash equilibrium in any two-player zero-sum game. In the simpler setting of perfect-information games, ReBeL reduces to an algorithm similar to AlphaZero. Results in two different imperfect-information games show ReBeL converges to an approximate Nash equilibrium. We also show ReBeL achieves superhuman performance in heads-up no-limit Texas hold'em poker, while using far less domain knowledge than any prior poker AI.
Authors: Noam Brown, Anton Bakhtin, Adam Lerer, Qucheng Gong
Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher
LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n