Dynamics-Aware Unsupervised Discovery of Skills (Paper Explained)

Subscribers:
284,000
Published on ● Video Link: https://www.youtube.com/watch?v=HYEzHX6-fIA



Duration: 50:02
7,120 views
277


This RL framework can discover low-level skills all by itself without any reward. Even better, at test time it can compose its learned skills and reach a specified goal without any additional learning! Warning: Math-heavy!

OUTLINE:
0:00 - Motivation
2:15 - High-Level Overview
3:20 - Model-Based vs Model-Free Reinforcement Learning
9:00 - Skills
12:10 - Mutual Information Objective
18:40 - Decomposition of the Objective
27:10 - Unsupervised Skill Discovery Algorithm
42:20 - Planning in Skill Space
48:10 - Conclusion

Paper: https://arxiv.org/abs/1907.01657
Website: https://sites.google.com/view/dads-skill
Code: https://github.com/google-research/dads

Abstract:
Conventionally, model-based reinforcement learning (MBRL) aims to learn a global model for the dynamics of the environment. A good model can potentially enable planning algorithms to generate a large variety of behaviors and solve diverse tasks. However, learning an accurate model for complex dynamical systems is difficult, and even then, the model might not generalize well outside the distribution of states on which it was trained. In this work, we combine model-based learning with model-free learning of primitives that make model-based planning easy. To that end, we aim to answer the question: how can we discover skills whose outcomes are easy to predict? We propose an unsupervised learning algorithm, Dynamics-Aware Discovery of Skills (DADS), which simultaneously discovers predictable behaviors and learns their dynamics. Our method can leverage continuous skill spaces, theoretically, allowing us to learn infinitely many behaviors even for high-dimensional state-spaces. We demonstrate that zero-shot planning in the learned latent space significantly outperforms standard MBRL and model-free goal-conditioned RL, can handle sparse-reward tasks, and substantially improves over prior hierarchical RL methods for unsupervised skill discovery.

Authors: Archit Sharma, Shixiang Gu, Sergey Levine, Vikash Kumar, Karol Hausman

Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher




Other Videos By Yannic Kilcher


2020-06-11Linformer: Self-Attention with Linear Complexity (Paper Explained)
2020-06-10End-to-End Adversarial Text-to-Speech (Paper Explained)
2020-06-09TransCoder: Unsupervised Translation of Programming Languages (Paper Explained)
2020-06-08JOIN ME for the NeurIPS 2020 Flatland Multi-Agent RL Challenge!
2020-06-07BLEURT: Learning Robust Metrics for Text Generation (Paper Explained)
2020-06-06Synthetic Petri Dish: A Novel Surrogate Model for Rapid Architecture Search (Paper Explained)
2020-06-05CornerNet: Detecting Objects as Paired Keypoints (Paper Explained)
2020-06-04Movement Pruning: Adaptive Sparsity by Fine-Tuning (Paper Explained)
2020-06-03Learning To Classify Images Without Labels (Paper Explained)
2020-06-02On the Measure of Intelligence by François Chollet - Part 1: Foundations (Paper Explained)
2020-06-01Dynamics-Aware Unsupervised Discovery of Skills (Paper Explained)
2020-05-31Synthesizer: Rethinking Self-Attention in Transformer Models (Paper Explained)
2020-05-30[Code] How to use Facebook's DETR object detection algorithm in Python (Full Tutorial)
2020-05-29GPT-3: Language Models are Few-Shot Learners (Paper Explained)
2020-05-28DETR: End-to-End Object Detection with Transformers (Paper Explained)
2020-05-27mixup: Beyond Empirical Risk Minimization (Paper Explained)
2020-05-26A critical analysis of self-supervision, or what we can learn from a single image (Paper Explained)
2020-05-25Deep image reconstruction from human brain activity (Paper Explained)
2020-05-24Regularizing Trajectory Optimization with Denoising Autoencoders (Paper Explained)
2020-05-23[News] The NeurIPS Broader Impact Statement
2020-05-22When BERT Plays the Lottery, All Tickets Are Winning (Paper Explained)



Tags:
deep learning
machine learning
arxiv
explained
neural networks
ai
artificial intelligence
paper
rl
deep rl
control
planning
world model
dads
skills
latent
high level
unsupervised
tree search
deep reinforcement learning
mujoco
ant
google