Improving Intrinsic Exploration with Language Abstractions (Machine Learning Paper Explained)

Subscribers:
284,000
Published on ● Video Link: https://www.youtube.com/watch?v=NeGJAUSQEJI



Duration: 42:26
8,944 views
236


#reinforcementlearning #ai #explained

Exploration is one of the oldest challenges for Reinforcement Learning algorithms, with no clear solution to date. Especially in environments with sparse rewards, agents face significant challenges in deciding which parts of the environment to explore further. Providing intrinsic motivation in form of a pseudo-reward is sometimes used to overcome this challenge, but often relies on hand-crafted heuristics, and can lead to deceptive dead-ends. This paper proposes to use language descriptions of encountered states as a method of assessing novelty. In two procedurally generated environments, they demonstrate the usefulness of language, which is in itself highly concise and abstractive, which lends itself well for this task.

OUTLINE:
0:00 - Intro
1:10 - Paper Overview: Language for exploration
5:40 - The MiniGrid & MiniHack environments
7:00 - Annotating states with language
9:05 - Baseline algorithm: AMIGo
12:20 - Adding language to AMIGo
22:55 - Baseline algorithm: NovelD and Random Network Distillation
29:45 - Adding language to NovelD
31:50 - Aren't we just using extra data?
34:55 - Investigating the experimental results
40:45 - Final comments

Paper: https://arxiv.org/abs/2202.08938

Abstract:
Reinforcement learning (RL) agents are particularly hard to train when rewards are sparse. One common solution is to use intrinsic rewards to encourage agents to explore their environment. However, recent intrinsic exploration methods often use state-based novelty measures which reward low-level exploration and may not scale to domains requiring more abstract skills. Instead, we explore natural language as a general medium for highlighting relevant abstractions in an environment. Unlike previous work, we evaluate whether language can improve over existing exploration methods by directly extending (and comparing to) competitive intrinsic exploration baselines: AMIGo (Campero et al., 2021) and NovelD (Zhang et al., 2021). These language-based variants outperform their non-linguistic forms by 45-85% across 13 challenging tasks from the MiniGrid and MiniHack environment suites.

Authors: Jesse Mu, Victor Zhong, Roberta Raileanu, Minqi Jiang, Noah Goodman, Tim Rocktäschel, Edward Grefenstette

Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
LinkedIn: https://www.linkedin.com/in/ykilcher
BiliBili: https://space.bilibili.com/2017636191

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n




Other Videos By Yannic Kilcher


2022-04-26Author Interview - ACCEL: Evolving Curricula with Regret-Based Environment Design
2022-04-25ACCEL: Evolving Curricula with Regret-Based Environment Design (Paper Review)
2022-04-22LAION-5B: 5 billion image-text-pairs dataset (with the authors)
2022-04-21Sparse Expert Models (Switch Transformers, GLAM, and more... w/ the Authors)
2022-04-17Author Interview - Transformer Memory as a Differentiable Search Index
2022-04-16Transformer Memory as a Differentiable Search Index (Machine Learning Research Paper Explained)
2022-04-10[ML News] Google's 540B PaLM Language Model & OpenAI's DALL-E 2 Text-to-Image Revolution
2022-04-06DALL-E 2 by OpenAI is out! Live Reaction
2022-04-04The Weird and Wonderful World of AI Art (w/ Author Jack Morris)
2022-04-02Author Interview - Improving Intrinsic Exploration with Language Abstractions
2022-04-01Improving Intrinsic Exploration with Language Abstractions (Machine Learning Paper Explained)
2022-03-30[ML News] GPT-3 learns to edit | Google Pathways | Make-A-Scene | CLIP meets GamePhysics | DouBlind
2022-03-29Author Interview - Memory-assisted prompt editing to improve GPT-3 after deployment
2022-03-28Memory-assisted prompt editing to improve GPT-3 after deployment (Machine Learning Paper Explained)
2022-03-26Author Interview - Typical Decoding for Natural Language Generation
2022-03-25Typical Decoding for Natural Language Generation (Get more human-like outputs from language models!)
2022-03-24One Model For All The Tasks - BLIP (Author Interview)
2022-03-23BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding&Generation
2022-03-21[ML News] AI Threatens Biological Arms Race
2022-03-20Active Dendrites avoid catastrophic forgetting - Interview with the Authors
2022-03-18Avoiding Catastrophe: Active Dendrites Enable Multi-Task Learning in Dynamic Environments (Review)



Tags:
deep learning
machine learning
arxiv
explained
neural networks
ai
artificial intelligence
paper
machine learning news
ml paper
machine learning paper
language
nlp
natural language processing
stanford
reinforcement learning
data science
deep learning tutorial
deep learning paper
language in reinforcement learning
rl nlp
nlp rl
nlp reinforcement learning
exploration exploitation
rl exploration