Planning to Explore via Self-Supervised World Models (Paper Explained)

Subscribers:
284,000
Published on ● Video Link: https://www.youtube.com/watch?v=IiBFqnNu7A8



Duration: 35:22
5,876 views
226


What can an agent do without any reward? Explore the world! While many formulations of intrinsic rewards exist (Curiosity, Novelty, etc.), they all look back in time to learn. Plan2Explore is the first model that uses planning in a learned imaginary latent world model to seek out states where it is uncertain about what will happen.

OUTLINE:
0:00 - Intro & Problem Statement
3:30 - Model
5:10 - Intrinsic Motivation
9:05 - Planning in Latent Space
11:15 - Latent Disagreement
16:30 - Maximizing Information Gain
21:00 - More problems with the model
26:45 - Experiments
32:10 - Final Comments

Paper: https://arxiv.org/abs/2005.05960
Website: https://ramanans1.github.io/plan2explore/
Code: https://github.com/ramanans1/plan2explore

Abstract:
Reinforcement learning allows solving complex tasks, however, the learning tends to be task-specific and the sample efficiency remains a challenge. We present Plan2Explore, a self-supervised reinforcement learning agent that tackles both these challenges through a new approach to self-supervised exploration and fast adaptation to new tasks, which need not be known during exploration. During exploration, unlike prior methods which retrospectively compute the novelty of observations after the agent has already reached them, our agent acts efficiently by leveraging planning to seek out expected future novelty. After exploration, the agent quickly adapts to multiple downstream tasks in a zero or a few-shot manner. We evaluate on challenging control tasks from high-dimensional image inputs. Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods, and in fact, almost matches the performances oracle which has access to rewards. Videos and code at this https URL

Authors: Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak Pathak

Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher




Other Videos By Yannic Kilcher


2020-05-27mixup: Beyond Empirical Risk Minimization (Paper Explained)
2020-05-26A critical analysis of self-supervision, or what we can learn from a single image (Paper Explained)
2020-05-25Deep image reconstruction from human brain activity (Paper Explained)
2020-05-24Regularizing Trajectory Optimization with Denoising Autoencoders (Paper Explained)
2020-05-23[News] The NeurIPS Broader Impact Statement
2020-05-22When BERT Plays the Lottery, All Tickets Are Winning (Paper Explained)
2020-05-21[News] OpenAI Model Generates Python Code
2020-05-20Investigating Human Priors for Playing Video Games (Paper & Demo)
2020-05-19iMAML: Meta-Learning with Implicit Gradients (Paper Explained)
2020-05-18[Code] PyTorch sentiment classifier from scratch with Huggingface NLP Library (Full Tutorial)
2020-05-17Planning to Explore via Self-Supervised World Models (Paper Explained)
2020-05-16[News] Facebook's Real-Time TTS system runs on CPUs only!
2020-05-15Weight Standardization (Paper Explained)
2020-05-14[Trash] Automated Inference on Criminality using Face Images
2020-05-13Faster Neural Network Training with Data Echoing (Paper Explained)
2020-05-12Group Normalization (Paper Explained)
2020-05-11Concept Learning with Energy-Based Models (Paper Explained)
2020-05-10[News] Google’s medical AI was super accurate in a lab. Real life was a different story.
2020-05-09Big Transfer (BiT): General Visual Representation Learning (Paper Explained)
2020-05-08Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning (Paper Explained)
2020-05-07WHO ARE YOU? 10k Subscribers Special (w/ Channel Analytics)



Tags:
deep learning
machine learning
arxiv
explained
neural networks
ai
artificial intelligence
paper
rl
deep rl
deep reinforcement learning
novelty
curiosity
intrinsic reward
dreamer
planet
control
walker
run forward
imaginary
imagination
planning
google
neural network
actor
critic
uncertainty
information gain
mutual information
model