[WeightWatcher] Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory

Channel:

LLMs Explained - Aggregate Intellect - AI.SCIENCE

Subscribers:

22,300

Published on November 6, 2019 6:40:57 PM ● Video Link: https://www.youtube.com/watch?v=DymfJGOOK_4

Duration: 1:08:16

707 views

For slides and more information on the paper, visit https://aisc.ai.science/events/2019-11-06

Discussion lead & author: Charles Martin

Abstract:

Random Matrix Theory (RMT) is applied to analyze weight matrices of Deep Neural Networks (DNNs), including both production quality, pre-trained models such as AlexNet and Inception, and smaller models trained from scratch, such as LeNet5 and a miniature-AlexNet. Empirical and theoretical results clearly indicate that the DNN training process itself implicitly implements a form of Self-Regularization. The empirical spectral density (ESD) of DNN layer matrices displays signatures of traditionally-regularized statistical models, even in the absence of exogenously specifying traditional forms of explicit regularization. Building on relatively recent results in RMT, most notably its extension to Universality classes of Heavy-Tailed matrices, we develop a theory to identify 5+1 Phases of Training, corresponding to increasing amounts of Implicit Self-Regularization. These phases can be observed during the training process as well as in the final learned DNNs. For smaller and/or older DNNs, this Implicit Self-Regularization is like traditional Tikhonov regularization, in that there is a "size scale" separating signal from noise. For state-of-the-art DNNs, however, we identify a novel form of Heavy-Tailed Self-Regularization, similar to the self-organization seen in the statistical physics of disordered systems. This results from correlations arising at all size scales, which arises implicitly due to the training process itself. This implicit Self-Regularization can depend strongly on the many knobs of the training process. By exploiting the generalization gap phenomena, we demonstrate that we can cause a small model to exhibit all 5+1 phases of training simply by changing the batch size. This demonstrates that---all else being equal---DNN optimization with larger batch sizes leads to less-well implicitly-regularized models, and it provides an explanation for the generalization gap phenomena.

Other Videos By LLMs Explained - Aggregate Intellect - AI.SCIENCE

2020-01-08	Overview of Modern Anomaly and Novelty Detection \| AISC
2020-01-06	Annotating Object Instances With a Polygon RNN \| AISC
2019-12-11	Predicting translational progress in biomedical research \| AISC
2019-12-09	AlphaStar explained: Grandmaster level in StarCraft II with multi-agent RL
2019-12-04	How Can We Be So Dense? The Benefits of Using Highly Sparse Representations \| AISC
2019-12-02	[RoBERT & ToBERT] Hierarchical Transformers for Long Document Classification \| AISC
2019-11-25	[OpenAI] Solving Rubik's Cube with a Robot Hand \| AISC
2019-11-18	Top-K Off-Policy Correction for a REINFORCE Recommender System \| AISC
2019-11-13	Overview of Unsupervised & Semi-supervised learning \| AISC
2019-11-11	Building products for Continous Delivery in Machine Learning \| AISC
2019-11-06	[WeightWatcher] Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory
2019-11-04	Defending Against Fake Neural News \| AISC
2019-10-28	[RecSys Challenge 2019 2nd Place] Robust Contextual Models for In-Session Personalization \| AISC
2019-10-23	Deep learning enables rapid identification of potent DDR1 kinase inhibitors \| AISC
2019-10-22	Restricted Boltzmann Machines for Collaborative Filtering \| AISC
2019-10-15	Location Intelligence Products: Goals and Challenges \| AISC
2019-10-15	RecSys, Reverse Engineering User's Needs and Desires \| AISC
2019-10-08	Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities \| AISC
2019-10-07	DeepStack: Expert-Level Artificial Intelligence in Heads-Up No-Limit Poker \| AISC
2019-09-30	AISC Abstract Night September Edition \| AISC
2019-09-26	EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis

Tags:

weightwatcher

deep learning

random matrix theory

Channel	Latest
TheOfficial Fuzion	6 hours ago
Jelly Jungle	6 hours ago
MakeItLook EZ	6 hours ago
MK Gamers	6 hours ago
DNA ON YOUTUBE	7 hours ago
Dragon Blogger Technology and Entertainment	7 hours ago
TheFunnyWeasel1	7 hours ago
Max Steel	7 hours ago
Rayan Zaidi	7 hours ago
Francisco beta77	7 hours ago
thegreyman	7 hours ago
Juegos Juguetes y Coleccionables	7 hours ago
Rarran	7 hours ago
ChopperWarDog008	7 hours ago
_NilsHaxard	7 hours ago
The Xesitz	7 hours ago
SuperGainsBros	7 hours ago
大腦動遊戲王	7 hours ago
CANSINO36	7 hours ago
Kaal Chamber	7 hours ago
GameTechPlanet	7 hours ago
Rodrigo Lopes	7 hours ago
ZdsPro	7 hours ago
CoverSolutions	7 hours ago
Jarvis Davis	7 hours ago