Planning to Explore via Self-Supervised World Models (Paper Explained)

Channel:

Yannic Kilcher

Subscribers:

291,000

Published on May 17, 2020 2:01:30 PM ● Video Link: https://www.youtube.com/watch?v=IiBFqnNu7A8

Duration: 35:22

5,876 views

226

What can an agent do without any reward? Explore the world! While many formulations of intrinsic rewards exist (Curiosity, Novelty, etc.), they all look back in time to learn. Plan2Explore is the first model that uses planning in a learned imaginary latent world model to seek out states where it is uncertain about what will happen.

OUTLINE:
0:00 - Intro & Problem Statement
3:30 - Model
5:10 - Intrinsic Motivation
9:05 - Planning in Latent Space
11:15 - Latent Disagreement
16:30 - Maximizing Information Gain
21:00 - More problems with the model
26:45 - Experiments
32:10 - Final Comments

Paper: https://arxiv.org/abs/2005.05960
Website: https://ramanans1.github.io/plan2explore/
Code: https://github.com/ramanans1/plan2explore

Abstract:
Reinforcement learning allows solving complex tasks, however, the learning tends to be task-specific and the sample efficiency remains a challenge. We present Plan2Explore, a self-supervised reinforcement learning agent that tackles both these challenges through a new approach to self-supervised exploration and fast adaptation to new tasks, which need not be known during exploration. During exploration, unlike prior methods which retrospectively compute the novelty of observations after the agent has already reached them, our agent acts efficiently by leveraging planning to seek out expected future novelty. After exploration, the agent quickly adapts to multiple downstream tasks in a zero or a few-shot manner. We evaluate on challenging control tasks from high-dimensional image inputs. Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods, and in fact, almost matches the performances oracle which has access to rewards. Videos and code at this https URL

Authors: Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak Pathak

Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher

Other Videos By Yannic Kilcher

2020-05-27	mixup: Beyond Empirical Risk Minimization (Paper Explained)
2020-05-26	A critical analysis of self-supervision, or what we can learn from a single image (Paper Explained)
2020-05-25	Deep image reconstruction from human brain activity (Paper Explained)
2020-05-24	Regularizing Trajectory Optimization with Denoising Autoencoders (Paper Explained)
2020-05-23	[News] The NeurIPS Broader Impact Statement
2020-05-22	When BERT Plays the Lottery, All Tickets Are Winning (Paper Explained)
2020-05-21	[News] OpenAI Model Generates Python Code
2020-05-20	Investigating Human Priors for Playing Video Games (Paper & Demo)
2020-05-19	iMAML: Meta-Learning with Implicit Gradients (Paper Explained)
2020-05-18	[Code] PyTorch sentiment classifier from scratch with Huggingface NLP Library (Full Tutorial)
2020-05-17	Planning to Explore via Self-Supervised World Models (Paper Explained)
2020-05-16	[News] Facebook's Real-Time TTS system runs on CPUs only!
2020-05-15	Weight Standardization (Paper Explained)
2020-05-14	[Trash] Automated Inference on Criminality using Face Images
2020-05-13	Faster Neural Network Training with Data Echoing (Paper Explained)
2020-05-12	Group Normalization (Paper Explained)
2020-05-11	Concept Learning with Energy-Based Models (Paper Explained)
2020-05-10	[News] Google’s medical AI was super accurate in a lab. Real life was a different story.
2020-05-09	Big Transfer (BiT): General Visual Representation Learning (Paper Explained)
2020-05-08	Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning (Paper Explained)
2020-05-07	WHO ARE YOU? 10k Subscribers Special (w/ Channel Analytics)

Tags:

deep learning

machine learning

arxiv

explained

neural networks

artificial intelligence

paper

deep rl

deep reinforcement learning

novelty

curiosity

intrinsic reward

dreamer

planet

control

walker

run forward

imaginary

imagination

planning

google

neural network

actor

critic

uncertainty

information gain

mutual information

model

Channel	Latest
alanzoka	10 hours ago
Beyond the Brick	12 hours ago
Nintendo Life	14 hours ago
IntroGameOver	15 hours ago
lugeyps3	16 hours ago
CarbotAnimations	17 hours ago
Pixelorez	17 hours ago
Primal Koopa Pictures	17 hours ago
BeastBoyShub	17 hours ago
Chroma	18 hours ago
Unnie Cj	18 hours ago
Brecy	19 hours ago
Renzuwu	19 hours ago
Fal Oval	19 hours ago
fadd game	19 hours ago
Aezwozere	19 hours ago
눈사람	19 hours ago
Fragilistic	19 hours ago
akitokid 青色夜想曲	19 hours ago
soydianagames	19 hours ago
상상상상	19 hours ago
Lucivius	19 hours ago
Ruckquez Nd Stuff	19 hours ago
野武士ノディー	19 hours ago
fan komar	19 hours ago