ReBeL - Combining Deep Reinforcement Learning and Search for Imperfect-Information Games (Explained)

Channel:

Yannic Kilcher

Subscribers:

291,000

Published on December 16, 2020 1:55:47 PM ● Video Link: https://www.youtube.com/watch?v=BhUWvQmLzSk

Category:

Guide

Duration: 1:12:22

32,399 views

986

#ai #technology #poker

This paper does for Poker what AlphaZero has done for Chess & Go. The combination of Self-Play Reinforcement Learning and Tree Search has had tremendous success in perfect-information games, but transferring such techniques to imperfect information games is a hard problem. Not only does ReBeL solve this problem, but it provably converges to a Nash Equilibrium and delivers a superhuman Heads Up No-Limit Hold'em bot with very little domain knowledge.

OUTLINE:
0:00 - Intro & Overview
3:20 - Rock, Paper, and Double Scissor
10:00 - AlphaZero Tree Search
18:30 - Notation Setup: Infostates & Nash Equilibria
31:45 - One Card Poker: Introducing Belief Representations
45:00 - Solving Games in Belief Representation
55:20 - The ReBeL Algorithm
1:04:00 - Theory & Experiment Results
1:07:00 - Broader Impact
1:10:20 - High-Level Summary

Paper: https://arxiv.org/abs/2007.13544
Code: https://github.com/facebookresearch/rebel
Blog: https://ai.facebook.com/blog/rebel-a-general-game-playing-ai-bot-that-excels-at-poker-and-more/

ERRATA: As someone last video pointed out: This is not the best Poker algorithm, but the best one that uses very little expert knowledge.

Abstract:
The combination of deep reinforcement learning and search at both training and test time is a powerful paradigm that has led to a number of successes in single-agent settings and perfect-information games, best exemplified by AlphaZero. However, prior algorithms of this form cannot cope with imperfect-information games. This paper presents ReBeL, a general framework for self-play reinforcement learning and search that provably converges to a Nash equilibrium in any two-player zero-sum game. In the simpler setting of perfect-information games, ReBeL reduces to an algorithm similar to AlphaZero. Results in two different imperfect-information games show ReBeL converges to an approximate Nash equilibrium. We also show ReBeL achieves superhuman performance in heads-up no-limit Texas hold'em poker, while using far less domain knowledge than any prior poker AI.

Authors: Noam Brown, Anton Bakhtin, Adam Lerer, Qucheng Gong

Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher
LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Other Videos By Yannic Kilcher

2021-02-11	Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention (AI Paper Explained)
2021-02-04	Deep Networks Are Kernel Machines (Paper Explained)
2021-02-02	Feedback Transformers: Addressing Some Limitations of Transformers with Feedback Memory (Explained)
2021-01-29	SingularityNET - A Decentralized, Open Market and Network for AIs (Whitepaper Explained)
2021-01-22	Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
2021-01-17	STOCHASTIC MEME DESCENT - Deep Learning Meme Review - Episode 2 (Part 2 of 2)
2021-01-12	OpenAI CLIP: ConnectingText and Images (Paper Explained)
2021-01-06	OpenAI DALL·E: Creating Images from Text (Blog Post Explained)
2020-12-26	Extracting Training Data from Large Language Models (Paper Explained)
2020-12-24	MEMES IS ALL YOU NEED - Deep Learning Meme Review - Episode 2 (Part 1 of 2)
2020-12-16	ReBeL - Combining Deep Reinforcement Learning and Search for Imperfect-Information Games (Explained)
2020-12-13	2M All-In into $5 Pot! WWYD? Daniel Negreanu's No-Limit Hold'em Challenge! (Poker Hand Analysis)
2020-12-01	DeepMind's AlphaFold 2 Explained! AI Breakthrough in Protein Folding! What we know (& what we don't)
2020-11-29	Predictive Coding Approximates Backprop along Arbitrary Computation Graphs (Paper Explained)
2020-11-22	Fourier Neural Operator for Parametric Partial Differential Equations (Paper Explained)
2020-11-15	[News] Soccer AI FAILS and mixes up ball and referee's bald head.
2020-11-10	Underspecification Presents Challenges for Credibility in Modern Machine Learning (Paper Explained)
2020-11-02	Language Models are Open Knowledge Graphs (Paper Explained)
2020-10-26	Rethinking Attention with Performers (Paper Explained)
2020-10-17	LambdaNetworks: Modeling long-range Interactions without Attention (Paper Explained)
2020-10-11	Descending through a Crowded Valley -- Benchmarking Deep Learning Optimizers (Paper Explained)

Tags:

deep learning

machine learning

arxiv

explained

neural networks

artificial intelligence

paper

poker

deep neural networks

facebook

facebook ai

rebel

holdem

texas holdem

rock paper scissors

liars dice

liar dice

self play

nash equilibrium

alpha go

alphazero

zero sum

policy

cfr

counterfactual regret minimization

tree search

monte carlo tree search

mcts

public belief state

infostate

value function

supergradient

strategy

actor critic

imperfect information

Channel	Latest
Lyacht Gaming	6 hours ago
yog dog / よぐ	6 hours ago
ROMA ROOM	6 hours ago
Censel Gemes	6 hours ago
10min Gameplay	6 hours ago
Hobsplay	6 hours ago
Bécon	6 hours ago
OnlyAfro VODS	7 hours ago
AuraGecko \| Gaming	7 hours ago
Emanu Korok	7 hours ago
ZBANKO	7 hours ago
Njenkin	7 hours ago
C4 V	7 hours ago
PhantomGirl	7 hours ago
Loyalty to the end	7 hours ago
Pablo DeNero	7 hours ago
Lutfi HM	7 hours ago
B I L L A	7 hours ago
MBG ARMY	7 hours ago
Кот Вася Чёрный	7 hours ago
ShinMaruku	7 hours ago
Sou Shibo LIVE!	7 hours ago
Edgar Harford	7 hours ago
KagendraID	8 hours ago
DracoGameur	8 hours ago