What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study (Paper Explained)

Channel:

Yannic Kilcher

Subscribers:

300,000

Published on August 20, 2020 9:46:35 AM ● Video Link: https://www.youtube.com/watch?v=a4VvcmqnkhY

Duration: 38:29

8,645 views

364

#ai #research #machinelearning

Online Reinforcement Learning is a flourishing field with countless methods for practitioners to choose from. However, each of those methods comes with a plethora of hyperparameter choices. This paper builds a unified framework for five continuous control tasks and investigates in a large-scale study the effects of these choices. As a result, they come up with a set of recommendations for future research and applications.

OUTLINE:
0:00 - Intro & Overview
3:55 - Parameterized Agents
7:00 - Unified Online RL and Parameter Choices
14:10 - Policy Loss
16:40 - Network Architecture
20:25 - Initial Policy
24:20 - Normalization & Clipping
26:30 - Advantage Estimation
28:55 - Training Setup
33:05 - Timestep Handling
34:10 - Optimizers
35:05 - Regularization
36:10 - Conclusion & Comments

Paper: https://arxiv.org/abs/2006.05990

Abstract:
In recent years, on-policy reinforcement learning (RL) has been successfully applied to many different continuous control tasks. While RL algorithms are often conceptually simple, their state-of-the-art implementations take numerous low- and high-level design decisions that strongly affect the performance of the resulting agents. Those choices are usually not extensively discussed in the literature, leading to discrepancy between published descriptions of algorithms and their implementations. This makes it hard to attribute progress in RL and slows down overall progress (Engstrom'20). As a step towards filling that gap, we implement over 50 such "choices" in a unified on-policy RL framework, allowing us to investigate their impact in a large-scale empirical study. We train over 250'000 agents in five continuous control environments of different complexity and provide insights and practical recommendations for on-policy training of RL agents.

Authors: Marcin Andrychowicz, Anton Raichuk, Piotr Stańczyk, Manu Orsini, Sertan Girgin, Raphael Marinier, Léonard Hussenot, Matthieu Geist, Olivier Pietquin, Marcin Michalski, Sylvain Gelly, Olivier Bachem

Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher
LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Other Videos By Yannic Kilcher

2020-10-11	Descending through a Crowded Valley -- Benchmarking Deep Learning Optimizers (Paper Explained)
2020-10-04	An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained)
2020-10-03	Training more effective learned optimizers, and using them to train themselves (Paper Explained)
2020-09-18	The Hardware Lottery (Paper Explained)
2020-09-13	Assessing Game Balance with AlphaZero: Exploring Alternative Rule Sets in Chess (Paper Explained)
2020-09-07	Learning to summarize from human feedback (Paper Explained)
2020-09-02	Self-classifying MNIST Digits (Paper Explained)
2020-08-28	Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation (Paper Explained)
2020-08-26	Radioactive data: tracing through training (Paper Explained)
2020-08-23	Fast reinforcement learning with generalized policy updates (Paper Explained)
2020-08-20	What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study (Paper Explained)
2020-08-18	[Rant] REVIEWER #2: How Peer Review is FAILING in Machine Learning
2020-08-14	REALM: Retrieval-Augmented Language Model Pre-Training (Paper Explained)
2020-08-12	Meta-Learning through Hebbian Plasticity in Random Networks (Paper Explained)
2020-08-09	Hopfield Networks is All You Need (Paper Explained)
2020-08-06	I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)
2020-08-04	PCGRL: Procedural Content Generation via Reinforcement Learning (Paper Explained)
2020-08-02	Big Bird: Transformers for Longer Sequences (Paper Explained)
2020-07-29	Self-training with Noisy Student improves ImageNet classification (Paper Explained)
2020-07-26	[Classic] Playing Atari with Deep Reinforcement Learning (Paper Explained)
2020-07-23	[Classic] ImageNet Classification with Deep Convolutional Neural Networks (Paper Explained)

Tags:

deep learning

machine learning

arxiv

explained

neural networks

artificial intelligence

paper

google

deep rl

deep reinforcement learning

on-policy

on policy

off policy

replay buffer

normalization

initialization

control

continuous control

deep neural networks

agent

environment

mujoco

hyperparameters

learning rate

optimizer

adam

entropy

regularization

grid search

Channel	Latest
MrT-Gaming	7 hours ago
The Nishant Vibe	7 hours ago
atv	7 hours ago
TerraChannel / TerraFox	7 hours ago
LukePingu	7 hours ago
Taffe316	7 hours ago
RapCheck	7 hours ago
SOLO GAMER	7 hours ago
Olympus	8 hours ago
Gellar Gaiden	8 hours ago
JÚNIOR GAELZIN	8 hours ago
DIOSTAR GAMER	8 hours ago
RUTAX FREESTYLE	8 hours ago
Loster99	8 hours ago
NS_ART	8 hours ago
Power Art YT	8 hours ago
iin indra wicahya	8 hours ago
TechBag	8 hours ago
milkcat 밀캣 (밀크캣)	8 hours ago
imjinxss	8 hours ago
Gauging Gadgets	8 hours ago
Sonic Plasma	8 hours ago
JSChels	8 hours ago
Boom Logo Effects	8 hours ago
DIGITAL UNDERGROUND GAMING	8 hours ago