ACCEL: Evolving Curricula with Regret-Based Environment Design (Paper Review)

Channel:

Yannic Kilcher

Subscribers:

300,000

Published on April 25, 2022 6:40:16 PM ● Video Link: https://www.youtube.com/watch?v=povBDxUn1VQ

Category:

Review

Duration: 44:06

10,049 views

227

#ai #accel #evolution

Automatic curriculum generation is one of the most promising avenues for Reinforcement Learning today. Multiple approaches have been proposed, each with their own set of advantages and drawbacks. This paper presents ACCEL, which takes the next step into the direction of constructing curricula for multi-capable agents. ACCEL combines the adversarial adaptiveness of regret-based sampling methods with the capabilities of level-editing, usually found in Evolutionary Methods.

OUTLINE:
0:00 - Intro & Demonstration
3:50 - Paper overview
5:20 - The ACCEL algorithm
15:25 - Looking at the pseudocode
23:10 - Approximating regret
33:45 - Experimental results
40:00 - Discussion & Comments

Website: https://accelagent.github.io
Paper: https://arxiv.org/abs/2203.01302

Abstract:
It remains a significant challenge to train generally capable agents with reinforcement learning (RL). A promising avenue for improving the robustness of RL agents is through the use of curricula. One such class of methods frames environment design as a game between a student and a teacher, using regret-based objectives to produce environment instantiations (or levels) at the frontier of the student agent's capabilities. These methods benefit from their generality, with theoretical guarantees at equilibrium, yet they often struggle to find effective levels in challenging design spaces. By contrast, evolutionary approaches seek to incrementally alter environment complexity, resulting in potentially open-ended learning, but often rely on domain-specific heuristics and vast amounts of computational resources. In this paper we propose to harness the power of evolution in a principled, regret-based curriculum. Our approach, which we call Adversarially Compounding Complexity by Editing Levels (ACCEL), seeks to constantly produce levels at the frontier of an agent's capabilities, resulting in curricula that start simple but become increasingly complex. ACCEL maintains the theoretical benefits of prior regret-based methods, while providing significant empirical gains in a diverse set of environments. An interactive version of the paper is available at this http URL.

Authors: Jack Parker-Holder, Minqi Jiang, Michael Dennis, Mikayel Samvelyan, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel

Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://ykilcher.com/discord
BitChute: https://www.bitchute.com/channel/yannic-kilcher
LinkedIn: https://www.linkedin.com/in/ykilcher
BiliBili: https://space.bilibili.com/2017636191

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Other Videos By Yannic Kilcher

2022-06-23	Parti - Scaling Autoregressive Models for Content-Rich Text-to-Image Generation (Paper Explained)
2022-06-15	Did Google's LaMDA chatbot just become sentient?
2022-06-03	GPT-4chan: This is the worst AI ever
2022-06-01	Did I crash the NFT market?
2022-05-13	[ML News] DeepMind's Flamingo Image-Text model \| Locked-Image Tuning \| Jurassic X & MRKL
2022-05-10	[ML News] Meta's OPT 175B language model \| DALL-E Mega is training \| TorToiSe TTS fakes my voice
2022-05-05	This A.I. creates infinite NFTs
2022-05-02	Author Interview: SayCan - Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
2022-04-30	Do As I Can, Not As I Say: Grounding Language in Robotic Affordances (SayCan - Paper Explained)
2022-04-26	Author Interview - ACCEL: Evolving Curricula with Regret-Based Environment Design
2022-04-25	ACCEL: Evolving Curricula with Regret-Based Environment Design (Paper Review)
2022-04-22	LAION-5B: 5 billion image-text-pairs dataset (with the authors)
2022-04-21	Sparse Expert Models (Switch Transformers, GLAM, and more... w/ the Authors)
2022-04-17	Author Interview - Transformer Memory as a Differentiable Search Index
2022-04-16	Transformer Memory as a Differentiable Search Index (Machine Learning Research Paper Explained)
2022-04-10	[ML News] Google's 540B PaLM Language Model & OpenAI's DALL-E 2 Text-to-Image Revolution
2022-04-06	DALL-E 2 by OpenAI is out! Live Reaction
2022-04-04	The Weird and Wonderful World of AI Art (w/ Author Jack Morris)
2022-04-02	Author Interview - Improving Intrinsic Exploration with Language Abstractions
2022-04-01	Improving Intrinsic Exploration with Language Abstractions (Machine Learning Paper Explained)
2022-03-30	[ML News] GPT-3 learns to edit \| Google Pathways \| Make-A-Scene \| CLIP meets GamePhysics \| DouBlind

Channel	Latest
Mystical Gaming	8 hours ago
floydbishop	10 hours ago
AuMiO VXC	10 hours ago
Soymilk Papi	10 hours ago
Underground Archives	10 hours ago
OkaiKami	11 hours ago
Flamingo	11 hours ago
DvD Contenidos	11 hours ago
WSMV 4 Nashville	11 hours ago
Jeff Wolf Plays	11 hours ago
KingGaming52	11 hours ago
Corner Line Studio	11 hours ago
Viny Tutoriais	11 hours ago
FolkNewGeneration	11 hours ago
MARKNP	11 hours ago
Siam Rahman	11 hours ago
Ris M	11 hours ago
Star ETtoday	11 hours ago
GhoulTube	12 hours ago
Annihilator	12 hours ago
HENRI ITCOM	12 hours ago
You, YosUki, & Games	12 hours ago
domisumReplay: Gangplank	12 hours ago
Anh Giáo Mê Games	12 hours ago
HEROSOMEONE	12 hours ago