Learning Rate Grafting: Transferability of Optimizer Tuning (Machine Learning Research Paper Review)

Channel:

Yannic Kilcher

Subscribers:

291,000

Published on November 20, 2021 3:47:46 PM ● Video Link: https://www.youtube.com/watch?v=vVRC-0VKPrg

Duration: 39:15

15,446 views

#grafting #adam #sgd

The last years in deep learning research have given rise to a plethora of different optimization algorithms, such as SGD, AdaGrad, Adam, LARS, LAMB, etc. which all claim to have their special peculiarities and advantages. In general, all algorithms modify two major things: The (implicit) learning rate schedule, and a correction to the gradient direction. This paper introduces grafting, which allows to transfer the induced learning rate schedule of one optimizer to another one. In that, the paper shows that much of the benefits of adaptive methods (e.g. Adam) are actually due to this schedule, and not necessarily to the gradient direction correction. Grafting allows for more fundamental research into differences and commonalities between optimizers, and a derived version of it makes it possible to computes static learning rate corrections for SGD, which potentially allows for large savings of GPU memory.

OUTLINE
0:00 - Rant about Reviewer #2
6:25 - Intro & Overview
12:25 - Adaptive Optimization Methods
20:15 - Grafting Algorithm
26:45 - Experimental Results
31:35 - Static Transfer of Learning Rate Ratios
35:25 - Conclusion & Discussion

Paper (OpenReview): https://openreview.net/forum?id=FpKgG31Z_i9
Old Paper (Arxiv): https://arxiv.org/abs/2002.11803

Our Discord: https://discord.gg/4H8xxDF

Abstract:
In the empirical science of training large neural networks, the learning rate schedule is a notoriously challenging-to-tune hyperparameter, which can depend on all other properties (architecture, optimizer, batch size, dataset, regularization, ...) of the problem. In this work, we probe the entanglements between the optimizer and the learning rate schedule. We propose the technique of optimizer grafting, which allows for the transfer of the overall implicit step size schedule from a tuned optimizer to a new optimizer, preserving empirical performance. This provides a robust plug-and-play baseline for optimizer comparisons, leading to reductions to the computational cost of optimizer hyperparameter search. Using grafting, we discover a non-adaptive learning rate correction to SGD which allows it to train a BERT model to state-of-the-art performance. Besides providing a resource-saving tool for practitioners, the invariances discovered via grafting shed light on the successes and failure modes of optimizers in deep learning.

Authors: Anonymous (Under Review)

Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
LinkedIn: https://www.linkedin.com/in/ykilcher
BiliBili: https://space.bilibili.com/2017636191

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Other Videos By Yannic Kilcher

2022-01-02	Player of Games: All the games, one algorithm! (w/ author Martin Schmid)
2021-12-30	ML News Live! (Dec 30, 2021) Anonymous user RIPS Tensorflw \| AI prosecutors rising \| Penny Challenge
2021-12-28	GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
2021-12-27	Machine Learning Holidays Live Stream
2021-12-26	Machine Learning Holiday Live Stream
2021-12-24	[ML News] AI learns to search the Internet \| Drawings come to life \| New ML journal launches
2021-12-21	[ML News] DeepMind builds Gopher \| Google builds GLaM \| Suicide capsule uses AI to check access
2021-11-27	Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions (Paper Explained)
2021-11-25	Peer Review is still BROKEN! The NeurIPS 2021 Review Experiment (results are in)
2021-11-24	Parameter Prediction for Unseen Deep Architectures (w/ First Author Boris Knyazev)
2021-11-20	Learning Rate Grafting: Transferability of Optimizer Tuning (Machine Learning Research Paper Review)
2021-11-18	[ML News] Cedille French Language Model \| YOU Search Engine \| AI Finds Profitable MEME TOKENS
2021-11-15	Gradients are Not All You Need (Machine Learning Research Paper Explained)
2021-11-12	[ML News] Microsoft combines Images & Text \| Meta makes artificial skin \| Russians replicate DALL-E
2021-11-10	Autoregressive Diffusion Models (Machine Learning Research Paper Explained)
2021-11-05	[ML News] Google introduces Pathways \| OpenAI solves Math Problems \| Meta goes First Person
2021-11-03	EfficientZero: Mastering Atari Games with Limited Data (Machine Learning Research Paper Explained)
2021-10-31	[YTalks] Siraj Raval - Stories about YouTube, Plagiarism, and the Dangers of Fame (Interview)
2021-10-29	[ML News] NVIDIA GTC'21 \| DeepMind buys MuJoCo \| Google predicts spreadsheet formulas
2021-10-29	[ML News GERMAN] NVIDIA GTC'21 \| DeepMind kauft MuJoCo \| Google Lernt Spreadsheet Formeln
2021-10-27	I went to an AI Art Festival in Geneva (AiiA Festival Trip Report)

Tags:

deep learning

machine learning

arxiv

explained

neural networks

artificial intelligence

paper

grafting

learning rate

deep learning learning rate

neural network learning rate

adaptive learning rate

adaptive optimizer

learning rate grafting

optimizer grafting

adam

sgd

adagrad

lars

lamb

openreview

reviewer

automatic learning rate

learning rate decay

learning rate warmup

Channel	Latest
Subodh Sinha	6 hours ago
Glint	6 hours ago
とっと	6 hours ago
AMMU GAMER	6 hours ago
ParKilleRz Ch.	6 hours ago
Gaming Grandpa	6 hours ago
SCARY GAMING	7 hours ago
Trailer Vault	7 hours ago
Lazy Mattman	7 hours ago
Lutpe Reaction	7 hours ago
Vebv Gaming	7 hours ago
MR ABHI gaming	7 hours ago
Dj Music Club	7 hours ago
Sidorovich Jr.	7 hours ago
あしゅら	7 hours ago
NAMAKOOL GAMING	7 hours ago
SAPINHOyoutub	7 hours ago
たこまる/TAKOMARU	7 hours ago
YBMJETT	7 hours ago
天才カメレオン	7 hours ago
サワリドのゲーム実況部屋	7 hours ago
Barzêl Gameplay	7 hours ago
ResinWoodArt - jedrek29t	7 hours ago
SEADOTES	8 hours ago
TheBrakeTrain	8 hours ago