Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos (Paper Explained)

Channel:

Yannic Kilcher

Subscribers:

300,000

Published on June 26, 2022 9:58:34 PM ● Video Link: https://www.youtube.com/watch?v=oz5yZc9ULAc

Duration: 32:34

38,887 views

1,397

#openai #vpt #minecraft

Minecraft is one of the harder challenges any RL agent could face. Episodes are long, and the world is procedurally generated, complex, and huge. Further, the action space is a keyboard and a mouse, which has to be operated only given the game's video input. OpenAI tackles this challenge using Video PreTraining, leveraging a small set of contractor data in order to pseudo-label a giant corpus of scraped footage of gameplay. The pre-trained model is highly capable in basic game mechanics and can be fine-tuned much better than a blank slate model. This is the first Minecraft agent that achieves the elusive goal of crafting a diamond pickaxe all by itself.

OUTLINE:
0:00 - Intro
3:50 - How to spend money most effectively?
8:20 - Getting a large dataset with labels
14:40 - Model architecture
19:20 - Experimental results and fine-tuning
25:40 - Reinforcement Learning to the Diamond Pickaxe
30:00 - Final comments and hardware

Blog: https://openai.com/blog/vpt/
Paper: https://arxiv.org/abs/2206.11795
Code & Model weights: https://github.com/openai/Video-Pre-Training

Abstract:
Pretraining on noisy, internet-scale datasets has been heavily studied as a technique for training models with broad, general capabilities for text, images, and other modalities. However, for many sequential decision domains such as robotics, video games, and computer use, publicly available data does not contain the labels required to train behavioral priors in the same way. We extend the internet-scale pretraining paradigm to sequential decision domains through semi-supervised imitation learning wherein agents learn to act by watching online unlabeled videos. Specifically, we show that with a small amount of labeled data we can train an inverse dynamics model accurate enough to label a huge unlabeled source of online data -- here, online videos of people playing Minecraft -- from which we can then train a general behavioral prior. Despite using the native human interface (mouse and keyboard at 20Hz), we show that this behavioral prior has nontrivial zero-shot capabilities and that it can be fine-tuned, with both imitation learning and reinforcement learning, to hard-exploration tasks that are impossible to learn from scratch via reinforcement learning. For many tasks our models exhibit human-level performance, and we are the first to report computer agents that can craft diamond tools, which can take proficient humans upwards of 20 minutes (24,000 environment actions) of gameplay to accomplish.

Authors: Bowen Baker, Ilge Akkaya, Peter Zhokhov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, Jeff Clune

Links:
Homepage: https://ykilcher.com
Merch: https://ykilcher.com/merch
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://ykilcher.com/discord
LinkedIn: https://www.linkedin.com/in/ykilcher

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Other Videos By Yannic Kilcher

2022-09-13	More Is Different for AI - Scaling Up, Emergence, and Paperclip Maximizers (w/ Jacob Steinhardt)
2022-09-02	The hidden dangers of loading open-source AI models (ARBITRARY CODE EXPLOIT!)
2022-08-26	The Future of AI is Self-Organizing and Self-Assembling (w/ Prof. Sebastian Risi)
2022-08-13	The Man behind Stable Diffusion
2022-08-10	[ML News] AI models that write code (Copilot, CodeWhisperer, Pangu-Coder, etc.)
2022-08-07	[ML News] Text-to-Image models are taking over! (Imagen, DALL-E 2, Midjourney, CogView 2 & more)
2022-07-31	[ML News] This AI completes Wikipedia! Meta AI Sphere \| Google Minerva \| GPT-3 writes a paper
2022-07-27	[ML News] BLOOM: 176B Open-Source \| Chinese Brain-Scale Computer \| Meta AI: No Language Left Behind
2022-07-06	JEPA - A Path Towards Autonomous Machine Intelligence (Paper Explained)
2022-07-02	ARC Challenge Live Coding
2022-06-26	Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos (Paper Explained)
2022-06-23	Parti - Scaling Autoregressive Models for Content-Rich Text-to-Image Generation (Paper Explained)
2022-06-15	Did Google's LaMDA chatbot just become sentient?
2022-06-03	GPT-4chan: This is the worst AI ever
2022-06-01	Did I crash the NFT market?
2022-05-13	[ML News] DeepMind's Flamingo Image-Text model \| Locked-Image Tuning \| Jurassic X & MRKL
2022-05-10	[ML News] Meta's OPT 175B language model \| DALL-E Mega is training \| TorToiSe TTS fakes my voice
2022-05-05	This A.I. creates infinite NFTs
2022-05-02	Author Interview: SayCan - Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
2022-04-30	Do As I Can, Not As I Say: Grounding Language in Robotic Affordances (SayCan - Paper Explained)
2022-04-26	Author Interview - ACCEL: Evolving Curricula with Regret-Based Environment Design

Tags:

deep learning

machine learning

arxiv

explained

neural networks

artificial intelligence

paper

minerl

minecraft ai

diamond pickaxe

ai diamond pickaxe

openai minecraft

deep learning projects

what is deep learning

deep learning tutorial

introduction to deep learning

gpt 3

gpt-3

vpt

video pretraining

video pre-training

openai vpt

vpt minecraft

minecarft

Channel	Latest
AHRCÉUS	6 hours ago
curionejo	7 hours ago
AzizGaming	7 hours ago
Maafia	7 hours ago
ForFor	7 hours ago
ShadowDante	8 hours ago
Shion	8 hours ago
Wallibear	8 hours ago
Sir Marcus	8 hours ago
PanzerTyDe	8 hours ago
FSH	8 hours ago
CHICAGO PLAY'S	8 hours ago
packattack04082	8 hours ago
GameToons	8 hours ago
Quentin Echols	9 hours ago
Amazementv	10 hours ago
Traxus	10 hours ago
CS Finland	10 hours ago
Lucas Ishii	10 hours ago
Darkness Games	10 hours ago
AuraDragoon ZX	10 hours ago
BHolo	10 hours ago
Reaper Kicker	10 hours ago
mariotehplumber	10 hours ago
Tongbos_EN	10 hours ago