Why do large batch sized trainings perform poorly in SGD? - Generalization Gap Explained | AISC

Channel:

LLMs Explained - Aggregate Intellect - AI.SCIENCE

Subscribers:

22,300

Published on April 18, 2019 10:44:11 AM ● Video Link: https://www.youtube.com/watch?v=crag6bMM-0k

Duration: 5:15

1,609 views

5-min ML Paper Challenge
Presenter: https://www.linkedin.com/in/xiyangchen/

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
https://arxiv.org/abs/1609.04836

The stochastic gradient descent (SGD) method and its variants are algorithms of choice for many Deep Learning tasks. These methods operate in a small-batch regime wherein a fraction of the training data, say 32-512 data points, is sampled to compute an approximation to the gradient. It has been observed in practice that when using a larger batch there is a degradation in the quality of the model, as measured by its ability to generalize. We investigate the cause for this generalization drop in the large-batch regime and present numerical evidence that supports the view that large-batch methods tend to converge to sharp minimizers of the training and testing functions - and as is well known, sharp minima lead to poorer generalization. In contrast, small-batch methods consistently converge to flat minimizers, and our experiments support a commonly held view that this is due to the inherent noise in the gradient estimation. We discuss several strategies to attempt to help large-batch methods eliminate this generalization gap.

Other Videos By LLMs Explained - Aggregate Intellect - AI.SCIENCE

2019-05-02	A Framework for Developing Deep Learning Classification Models
2019-05-02	Revolutionizing Diet and Health with CNN's and the Microbiome
2019-05-02	Efficient implementation of a neural network on hardware using compression techniques
2019-05-02	Supercharging AI with high performance distributed computing
2019-05-02	Combining Satellite Imagery and machine learning to predict poverty
2019-05-02	Revolutionary Deep Learning Method to Denoise EEG Brainwaves
2019-04-25	[LISA] Linguistically-Informed Self-Attention for Semantic Role Labeling \| AISC
2019-04-23	How goodness metrics lead to undesired recommendations
2019-04-22	Deep Neural Networks for YouTube Recommendation \| AISC Foundational
2019-04-18	[Phoenics] A Bayesian Optimizer for Chemistry \| AISC Author Speaking
2019-04-18	Why do large batch sized trainings perform poorly in SGD? - Generalization Gap Explained \| AISC
2019-04-16	Structured Neural Summarization \| AISC Lunch & Learn
2019-04-11	Deep InfoMax: Learning deep representations by mutual information estimation and maximization \| AISC
2019-04-08	ACT: Adaptive Computation Time for Recurrent Neural Networks \| AISC
2019-04-04	[FFJORD] Free-form Continuous Dynamics for Scalable Reversible Generative Models (Part 1) \| AISC
2019-04-01	[DOM-Q-NET] Grounded RL on Structured Language \| AISC Author Speaking
2019-03-31	5-min [machine learning] paper challenge \| AISC
2019-03-28	[Variational Autoencoder] Auto-Encoding Variational Bayes \| AISC Foundational
2019-03-25	[GQN] Neural Scene Representation and Rendering \| AISC
2019-03-21	Towards Interpretable Deep Neural Networks by Leveraging Adversarial Examples \| AISC
2019-03-18	Understanding the Origins of Bias in Word Embeddings

Tags:

deep learning

machine learning

SGD

large batch traning

generalization gap

stochastic gradient descent

Channel	Latest
SINASEFE	6 hours ago
Fidel Investing	6 hours ago
GaplekBehemoth	6 hours ago
MisrraVB - League of Legends	6 hours ago
Instituto Conhecimento Liberta	7 hours ago
backyarD_D Play Records	7 hours ago
Comics, Toys & Travels	7 hours ago
Joe Bartolozzi	7 hours ago
O Tung Sahur 🪵	7 hours ago
Thiodar	7 hours ago
LifeOnTheGrid	7 hours ago
Jaeger Supreme	7 hours ago
Beaglerush VODs	7 hours ago
Rebel Reindeer	7 hours ago
xiMortalTV	7 hours ago
MrHeroGames	7 hours ago
StupidlyEPIC	7 hours ago
Cawiska	7 hours ago
Ibai	7 hours ago
ミネイ	7 hours ago
NeoSuko	7 hours ago
VODJJ	7 hours ago
SoosKratoS	8 hours ago
Chansons Corses Best Corsu Corsica kallisté Olivi	8 hours ago
mixwell	8 hours ago