Can Wikipedia Help Offline Reinforcement Learning? (Paper Explained)

Channel:

Yannic Kilcher

Subscribers:

300,000

Published on February 26, 2022 2:02:37 PM ● Video Link: https://www.youtube.com/watch?v=XHGh19Hbx48

Category:

Let's Play

Duration: 38:35

11,213 views

339

#wikipedia #reinforcementlearning #languagemodels

Transformers have come to overtake many domain-targeted custom models in a wide variety of fields, such as Natural Language Processing, Computer Vision, Generative Modelling, and recently also Reinforcement Learning. This paper looks at the Decision Transformer and shows that, surprisingly, pre-training the model on a language-modelling task significantly boosts its performance on Offline Reinforcement Learning. The resulting model achieves higher scores, can get away with less parameters, and exhibits superior scaling properties. This raises many questions about the fundamental connection between the domains of language and RL.

OUTLINE:
0:00 - Intro
1:35 - Paper Overview
7:35 - Offline Reinforcement Learning as Sequence Modelling
12:00 - Input Embedding Alignment & other additions
16:50 - Main experimental results
20:45 - Analysis of the attention patterns across models
32:25 - More experimental results (scaling properties, ablations, etc.)
37:30 - Final thoughts

Paper: https://arxiv.org/abs/2201.12122
Code: https://github.com/machelreid/can-wikipedia-help-offline-rl
My Video on Decision Transformer: https://youtu.be/-buULmf7dec

Abstract:
Fine-tuning reinforcement learning (RL) models has been challenging because of a lack of large scale off-the-shelf datasets as well as high variance in transferability among different environments. Recent work has looked at tackling offline RL from the perspective of sequence modeling with improved results as result of the introduction of the Transformer architecture. However, when the model is trained from scratch, it suffers from slow convergence speeds. In this paper, we look to take advantage of this formulation of reinforcement learning as sequence modeling and investigate the transferability of pre-trained sequence models on other domains (vision, language) when finetuned on offline RL tasks (control, games). To this end, we also propose techniques to improve transfer between these domains. Results show consistent performance gains in terms of both convergence speed and reward on a variety of environments, accelerating training by 3-6x and achieving state-of-the-art performance in a variety of tasks using Wikipedia-pretrained and GPT2 language models. We hope that this work not only brings light to the potentials of leveraging generic sequence modeling techniques and pre-trained models for RL, but also inspires future work on sharing knowledge between generative modeling tasks of completely different domains.

Authors: Machel Reid, Yutaro Yamada, Shixiang Shane Gu

Links:
Merch: http://store.ykilcher.com
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
LinkedIn: https://www.linkedin.com/in/ykilcher
BiliBili: https://space.bilibili.com/2017636191

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Other Videos By Yannic Kilcher

2022-03-18	Avoiding Catastrophe: Active Dendrites Enable Multi-Task Learning in Dynamic Environments (Review)
2022-03-14	Author Interview - VOS: Learning What You Don't Know by Virtual Outlier Synthesis
2022-03-13	VOS: Learning What You Don't Know by Virtual Outlier Synthesis (Paper Explained)
2022-03-08	Spurious normativity enhances learning of compliance and enforcement behavior in artificial agents
2022-03-06	First Author Interview: AI & formal math (Formal Mathematics Statement Curriculum Learning)
2022-03-05	OpenAI tackles Math - Formal Mathematics Statement Curriculum Learning (Paper Explained)
2022-03-04	[ML News] DeepMind controls fusion \| Yann LeCun's JEPA architecture \| US: AI can't copyright its art
2022-03-02	AlphaCode - with the authors!
2022-03-01	Competition-Level Code Generation with AlphaCode (Paper Review)
2022-02-28	Can Wikipedia Help Offline Reinforcement Learning? (Author Interview)
2022-02-26	Can Wikipedia Help Offline Reinforcement Learning? (Paper Explained)
2022-02-23	[ML Olds] Meta Research Supercluster \| OpenAI GPT-Instruct \| Google LaMDA \| Drones fight Pigeons
2022-02-21	Listening to You! - Channel Update (Author Interviews)
2022-02-20	All about AI Accelerators: GPU, TPU, Dataflow, Near-Memory, Optical, Neuromorphic & more (w/ Author)
2022-02-18	[ML News] Uber: Deep Learning for ETA \| MuZero Video Compression \| Block-NeRF \| EfficientNet-X
2022-02-17	CM3: A Causal Masked Multimodal Model of the Internet (Paper Explained w/ Author Interview)
2022-02-16	AI against Censorship: Genetic Algorithms, The Geneva Project, ML in Security, and more!
2022-02-15	HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning (w/ Author)
2022-02-10	[ML News] DeepMind AlphaCode \| OpenAI math prover \| Meta battles harmful content with AI
2022-02-08	Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents (+Author)
2022-02-07	OpenAI Embeddings (and Controversy?!)

Tags:

deep learning

machine learning

arxiv

explained

neural networks

artificial intelligence

paper

Channel	Latest
Saber Kingscrown	6 hours ago
Tech Technical sk	6 hours ago
Mr. Souer	6 hours ago
Obake PAM Ch.	6 hours ago
Shourize Hobby	6 hours ago
SeriouslyTheCat	6 hours ago
DARK Gaming	7 hours ago
KABEGON JAPAN	7 hours ago
せしるおじさん	7 hours ago
Cartoon Freak #	7 hours ago
PUBG: BATTLEGROUNDS INDONESIA	7 hours ago
치리스Chirisu	7 hours ago
Munam Aslam	7 hours ago
Yudi Syahputra	7 hours ago
Mololo	7 hours ago
TheOnlyAlphaGamer	7 hours ago
StephanZA	7 hours ago
MURASAKI 夢羅佐希 GAME日記	7 hours ago
Julius Preset • 37 rb x ditonton • 5 jam yang lalu	7 hours ago
Microboy	7 hours ago
Dialga22239	7 hours ago
GameXnews	7 hours ago
GB GAMER	7 hours ago
Shadow Gaming	7 hours ago
香口Karl	7 hours ago