Training more effective learned optimizers, and using them to train themselves (Paper Explained)

Channel:

Yannic Kilcher

Subscribers:

300,000

Published on October 3, 2020 4:06:32 PM ● Video Link: https://www.youtube.com/watch?v=3baFTP0uYOc

Duration: 53:36

18,487 views

799

#ai #research #optimization

Optimization is still the domain of hand-crafted, simple algorithms. An ML engineer not only has to pick a suitable one for their problem but also often do grid-search over various hyper-parameters. This paper proposes to learn a single, unified optimization algorithm, given not by an equation, but by an LSTM-based neural network, to act as an optimizer for any deep learning problem, and ultimately to optimize itself.

OUTLINE:
0:00 - Intro & Outline
2:20 - From Hand-Crafted to Learned Features
4:25 - Current Optimization Algorithm
9:40 - Learned Optimization
15:50 - Optimizer Architecture
22:50 - Optimizing the Optimizer using Evolution Strategies
30:30 - Task Dataset
34:00 - Main Results
36:50 - Implicit Regularization in the Learned Optimizer
41:05 - Generalization across Tasks
41:40 - Scaling Up
45:30 - The Learned Optimizer Trains Itself
47:20 - Pseudocode
49:45 - Broader Impact Statement
52:55 - Conclusion & Comments

Paper: https://arxiv.org/abs/2009.11243

Abstract:
Much as replacing hand-designed features with learned functions has revolutionized how we solve perceptual tasks, we believe learned algorithms will transform how we train models. In this work we focus on general-purpose learned optimizers capable of training a wide variety of problems with no user-specified hyperparameters. We introduce a new, neural network parameterized, hierarchical optimizer with access to additional features such as validation loss to enable automatic regularization. Most learned optimizers have been trained on only a single task, or a small number of tasks. We train our optimizers on thousands of tasks, making use of orders of magnitude more compute, resulting in optimizers that generalize better to unseen tasks. The learned optimizers not only perform well, but learn behaviors that are distinct from existing first order optimizers. For instance, they generate update steps that have implicit regularization and adapt as the problem hyperparameters (e.g. batch size) or architecture (e.g. neural network width) change. Finally, these learned optimizers show evidence of being useful for out of distribution tasks such as training themselves from scratch.

Authors: Luke Metz, Niru Maheswaranathan, C. Daniel Freeman, Ben Poole, Jascha Sohl-Dickstein

Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher
LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Other Videos By Yannic Kilcher

2020-12-01	DeepMind's AlphaFold 2 Explained! AI Breakthrough in Protein Folding! What we know (& what we don't)
2020-11-29	Predictive Coding Approximates Backprop along Arbitrary Computation Graphs (Paper Explained)
2020-11-22	Fourier Neural Operator for Parametric Partial Differential Equations (Paper Explained)
2020-11-15	[News] Soccer AI FAILS and mixes up ball and referee's bald head.
2020-11-10	Underspecification Presents Challenges for Credibility in Modern Machine Learning (Paper Explained)
2020-11-02	Language Models are Open Knowledge Graphs (Paper Explained)
2020-10-26	Rethinking Attention with Performers (Paper Explained)
2020-10-17	LambdaNetworks: Modeling long-range Interactions without Attention (Paper Explained)
2020-10-11	Descending through a Crowded Valley -- Benchmarking Deep Learning Optimizers (Paper Explained)
2020-10-04	An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained)
2020-10-03	Training more effective learned optimizers, and using them to train themselves (Paper Explained)
2020-09-18	The Hardware Lottery (Paper Explained)
2020-09-13	Assessing Game Balance with AlphaZero: Exploring Alternative Rule Sets in Chess (Paper Explained)
2020-09-07	Learning to summarize from human feedback (Paper Explained)
2020-09-02	Self-classifying MNIST Digits (Paper Explained)
2020-08-28	Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation (Paper Explained)
2020-08-26	Radioactive data: tracing through training (Paper Explained)
2020-08-23	Fast reinforcement learning with generalized policy updates (Paper Explained)
2020-08-20	What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study (Paper Explained)
2020-08-18	[Rant] REVIEWER #2: How Peer Review is FAILING in Machine Learning
2020-08-14	REALM: Retrieval-Augmented Language Model Pre-Training (Paper Explained)

Tags:

deep learning

machine learning

arxiv

explained

neural networks

artificial intelligence

paper

optimization

lstm

taskset

google

google research

compute

outer optimization

adam

adamw

sgd

momentum

learning rate

gradient

learned optimizer

second moment

cnn

rnn

paper explained

neural network

gradient descent

hyper parameters

grid search

mnist

cifar10

imagenet

Channel	Latest
DARK Gaming	6 hours ago
せしるおじさん	6 hours ago
Cartoon Freak #	6 hours ago
PUBG: BATTLEGROUNDS INDONESIA	6 hours ago
Munam Aslam	6 hours ago
Yudi Syahputra	6 hours ago
StephanZA	6 hours ago
MURASAKI 夢羅佐希 GAME日記	6 hours ago
Julius Preset • 37 rb x ditonton • 5 jam yang lalu	6 hours ago
Microboy	6 hours ago
Dialga22239	6 hours ago
GameXnews	6 hours ago
GB GAMER	6 hours ago
realme Indonesia	6 hours ago
Ryuk Leonidas	6 hours ago
Gaming Raju	7 hours ago
BKCG gaming	7 hours ago
Avinash Gaming Official	7 hours ago
Kir Lucky	7 hours ago
हरामी GAMER	7 hours ago
Momiji	7 hours ago
Bruce Lee Fight UFC	7 hours ago
Grumble Gamers	7 hours ago
NoMii Aegis	7 hours ago
JastrzabPost	7 hours ago