[T-Fixup] Improving Transformer Optimization Through Better Initialization | AISC

Channel:

LLMs Explained - Aggregate Intellect - AI.SCIENCE

Subscribers:

22,300

Published on August 26, 2020 4:35:42 AM ● Video Link: https://www.youtube.com/watch?v=EpxilvBvAeQ

Duration: 34:47

696 views

Speaker(s): Gary Huang
Facilitator(s): Royal Sequiera, Nour Fahmy

Find the recording, slides, and more info at https://ai.science/e/t-fixup-improving-transformer-optimization-through-better-initialization--7SFFcJpCk07bPJ3tKdMP

Motivation / Abstract
The Transformer architecture has achieved considerable success recently; the key component of the Transformer is the attention layer that enables the model to focus on important regions within an input sequence. Gradient optimization with attention layers can be notoriously difficult requiring tricks such as learning rate warmup to prevent divergence. As Transformer models are becoming larger and more expensive to train, recent research has focused on understanding and improving optimization in these architectures. In this work our contributions are two-fold: we first investigate and empirically validate the source of optimization problems in the encoder-decoder Transformer architecture; we then propose a new weight initialization scheme with theoretical justification, that enables training without warmup or layer normalization. Empirical results on public machine translation benchmarks show that our approach achieves leading accuracy, allowing to train deep Transformer models with 200 layers in both encoder and decoder (over 1000 attention/MLP blocks) without difficulty.

------
#AISC hosts 3-5 live sessions like this on various AI research, engineering, and product topics every week! Visit https://ai.science for more details

Other Videos By LLMs Explained - Aggregate Intellect - AI.SCIENCE

2020-09-10	An overview of task-oriented dialog systems \| AISC
2020-09-09	Targeted Machine Learning for Data Science \| AISC
2020-09-08	Build next generation recommenders with NVIDIA Merlin \| AISC
2020-09-02	Principal Neighbourhood Aggregation for Graph Nets \| AISC
2020-09-01	DeepFakes & Explainable AI Applications in NLP, Biomedical & Malware Classification
2020-08-28	AI Ethics Then & Now: A Look Back on the Last Five Years \| AISC
2020-08-27	Beyond Accuracy: Behavioral Testing of NLP Models with CheckList \| AISC
2020-08-27	The Summary Loop: Learning to Write Abstractive Summaries Without Examples + Demo \| AISC
2020-08-26	[MEM] Learning Permutation Invariant Representations using Memory Networks \| AISC
2020-08-26	AI for Fun!
2020-08-25	[T-Fixup] Improving Transformer Optimization Through Better Initialization \| AISC
2020-08-25	A review of ML for aerospace systems health management \| AISC
2020-08-21	An Efficient Neighborhood-based Interaction Model for Recommendation on Heterogeneous Graph \| AISC
2020-08-20	Overview of Synthetic Data and Simulations \| AISC
2020-08-19	Discovering Symbolic Inductive Biases \| AISC
2020-08-19	Product Ideation - Art of Finding the Right Problem to Work on! \| AISC
2020-08-19	Pink Diamond - Data Driven Prediction of Venture Success \| Workshop Capstone
2020-08-19	Review Nuggets - Mining Insight from Consumer Product Reviews \| Workshop Capstone
2020-08-19	Fast Film - Emotionally Aware Movie Recommender \| Workshop Capstone
2020-08-19	Acetock - Stock Prediction Tool for Amateur Investors \| Workshop Capstone
2020-08-19	Saramsh - Patent Document Summarization using BART \| Workshop Capstone

Channel	Latest
Zlabus	6 hours ago
TsunarKelone	6 hours ago
Cevlo	6 hours ago
Alterny Vibe	7 hours ago
Gboogie32	7 hours ago
DooM49	7 hours ago
Waffles	7 hours ago
DiZtaRi	7 hours ago
Cory Campbell	7 hours ago
Purple Kyogre	7 hours ago
Everyday Special	7 hours ago
HGW Trilhas Sonoras	7 hours ago
CohhCarnage	8 hours ago
Gamer _-_ 24	8 hours ago
KwingsLetsPlays	8 hours ago
Rediscover Redstone	8 hours ago
markwerbenjagermanjensen	8 hours ago
Gameplay y Manga	8 hours ago
El Mundo Según Alejo	8 hours ago
TNH Nebula	8 hours ago
4K Gaming	8 hours ago
An Sang Wu	8 hours ago
TrollForce	8 hours ago
gattu	8 hours ago
SARU TV	8 hours ago