Dreamer v2: Mastering Atari with Discrete World Models (Machine Learning Research Paper Explained)

Channel:

Yannic Kilcher

Subscribers:

291,000

Published on February 19, 2021 4:11:18 PM ● Video Link: https://www.youtube.com/watch?v=o75ybZ-6Uu8

Duration: 54:59

21,810 views

821

#dreamer #deeprl #reinforcementlearning

Model-Based Reinforcement Learning has been lagging behind Model-Free RL on Atari, especially among single-GPU algorithms. This collaboration between Google AI, DeepMind, and the University of Toronto (UofT) pushes world models to the next level. The main contribution is a learned latent state consisting of one discrete part and one stochastic part, whereby the stochastic part is a set of 32 categorical variables, each with 32 possible values. The world model can freely decide how it wants to use these variables to represent the input, but is tasked with the prediction of future observations and rewards. This procedure gives rise to an informative latent representation and in a second step, reinforcement learning (A2C Actor-Critic) can be done purely - and very efficiently - on the basis of the world-model's latent states. No observations needed! This paper combines this with straight-through estimators, KL balancing, and many other tricks to achieve state-of-the-art single-GPU performance in Atari.

OUTLINE:
0:00 - Intro & Overview
4:50 - Short Recap of Reinforcement Learning
6:05 - Problems with Model-Free Reinforcement Learning
10:40 - How World Models Help
12:05 - World Model Learner Architecture
16:50 - Deterministic & Stochastic Hidden States
18:50 - Latent Categorical Variables
22:00 - Categorical Variables and Multi-Modality
23:20 - Sampling & Stochastic State Prediction
30:55 - Actor-Critic Learning in Dream Space
32:05 - The Incompleteness of Learned World Models
34:15 - How General is this Algorithm?
37:25 - World Model Loss Function
39:20 - KL Balancing
40:35 - Actor-Critic Loss Function
41:45 - Straight-Through Estimators for Sampling Backpropagation
46:25 - Experimental Results
52:00 - Where Does It Fail?
54:25 - Conclusion

Paper: https://arxiv.org/abs/2010.02193
Code: https://github.com/danijar/dreamerv2
Author Blog: https://danijar.com/project/dreamerv2/
Google AI Blog: https://ai.googleblog.com/2021/02/mastering-atari-with-discrete-world.html

ERRATA (from the authors):
- KL balancing (prior vs posterior within the KL) is different from beta VAEs (reconstruction vs KL)
- The vectors of categoricals can in theory represent 32^32 different images so their capacity is quite large

Abstract:
Intelligent agents need to generalize from past experience to achieve goals in complex environments. World models facilitate such generalization and allow learning behaviors from imagined outcomes to increase sample-efficiency. While learning world models from image inputs has recently become feasible for some tasks, modeling Atari games accurately enough to derive successful behaviors has remained an open challenge for many years. We introduce DreamerV2, a reinforcement learning agent that learns behaviors purely from predictions in the compact latent space of a powerful world model. The world model uses discrete representations and is trained separately from the policy. DreamerV2 constitutes the first agent that achieves human-level performance on the Atari benchmark of 55 tasks by learning behaviors inside a separately trained world model. With the same computational budget and wall-clock time, DreamerV2 reaches 200M frames and exceeds the final performance of the top single-GPU agents IQN and Rainbow.

Authors: Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, Jimmy Ba

Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher
LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/
BiliBili: https://space.bilibili.com/1824646584

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Other Videos By Yannic Kilcher

2021-03-30	Machine Learning PhD Survival Guide 2021 \| Advice on Topic Selection, Papers, Conferences & more!
2021-03-23	Is Google Translate Sexist? Gender Stereotypes in Statistical Machine Translation
2021-03-22	Perceiver: General Perception with Iterative Attention (Google DeepMind Research Paper Explained)
2021-03-16	Pretrained Transformers as Universal Computation Engines (Machine Learning Research Paper Explained)
2021-03-11	Yann LeCun - Self-Supervised Learning: The Dark Matter of Intelligence (FAIR Blog Post Explained)
2021-03-06	Apple or iPod??? Easy Fix for Adversarial Textual Attacks on OpenAI's CLIP Model! #Shorts
2021-03-05	Multimodal Neurons in Artificial Neural Networks (w/ OpenAI Microscope, Research Paper Explained)
2021-02-27	GLOM: How to represent part-whole hierarchies in a neural network (Geoff Hinton's Paper Explained)
2021-02-26	Linear Transformers Are Secretly Fast Weight Memory Systems (Machine Learning Paper Explained)
2021-02-25	DeBERTa: Decoding-enhanced BERT with Disentangled Attention (Machine Learning Paper Explained)
2021-02-19	Dreamer v2: Mastering Atari with Discrete World Models (Machine Learning Research Paper Explained)
2021-02-17	TransGAN: Two Transformers Can Make One Strong GAN (Machine Learning Research Paper Explained)
2021-02-14	NFNets: High-Performance Large-Scale Image Recognition Without Normalization (ML Paper Explained)
2021-02-11	Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention (AI Paper Explained)
2021-02-04	Deep Networks Are Kernel Machines (Paper Explained)
2021-02-02	Feedback Transformers: Addressing Some Limitations of Transformers with Feedback Memory (Explained)
2021-01-29	SingularityNET - A Decentralized, Open Market and Network for AIs (Whitepaper Explained)
2021-01-22	Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
2021-01-17	STOCHASTIC MEME DESCENT - Deep Learning Meme Review - Episode 2 (Part 2 of 2)
2021-01-12	OpenAI CLIP: ConnectingText and Images (Paper Explained)
2021-01-06	OpenAI DALL·E: Creating Images from Text (Blog Post Explained)

Tags:

deep learning

machine learning

arxiv

explained

neural networks

artificial intelligence

paper

reinforcement learning

deep reinforcement learning

dreamer

dreamer v2

dreamer rl

dreamer reinforcement learning

google reinforcement learning

deepmind reinforcement learning

google ai

world model

world model reinforcement learning

google deepmind world model

google deepmind reinforcement learning

atari reinforcement learning

atari world model

rainbow

muzero

Channel	Latest
Subodh Sinha	6 hours ago
Glint	6 hours ago
とっと	6 hours ago
AMMU GAMER	6 hours ago
ParKilleRz Ch.	6 hours ago
SCARY GAMING	7 hours ago
Trailer Vault	7 hours ago
Lazy Mattman	7 hours ago
Lutpe Reaction	7 hours ago
Vebv Gaming	7 hours ago
MR ABHI gaming	7 hours ago
Dj Music Club	7 hours ago
Sidorovich Jr.	7 hours ago
あしゅら	7 hours ago
NAMAKOOL GAMING	7 hours ago
SAPINHOyoutub	7 hours ago
たこまる/TAKOMARU	7 hours ago
YBMJETT	7 hours ago
天才カメレオン	7 hours ago
サワリドのゲーム実況部屋	7 hours ago
Barzêl Gameplay	7 hours ago
ResinWoodArt - jedrek29t	7 hours ago
SEADOTES	8 hours ago
TheBrakeTrain	8 hours ago
Anari Queen Gaming	8 hours ago