Weight Standardization (Paper Explained)

Subscribers:
286,000
Published on ● Video Link: https://www.youtube.com/watch?v=p-zOeQCoG9c



Duration: 19:16
9,329 views
333


It's common for neural networks to include data normalization such as BatchNorm or GroupNorm. This paper extends the normalization to also include the weights of the network. This surprisingly simple change leads to a boost in performance and - combined with GroupNorm - new state-of-the-art results.

https://arxiv.org/abs/1903.10520

Abstract:
In this paper, we propose Weight Standardization (WS) to accelerate deep network training. WS is targeted at the micro-batch training setting where each GPU typically has only 1-2 images for training. The micro-batch training setting is hard because small batch sizes are not enough for training networks with Batch Normalization (BN), while other normalization methods that do not rely on batch knowledge still have difficulty matching the performances of BN in large-batch training. Our WS ends this problem because when used with Group Normalization and trained with 1 image/GPU, WS is able to match or outperform the performances of BN trained with large batch sizes with only 2 more lines of code. In micro-batch training, WS significantly outperforms other normalization methods. WS achieves these superior results by standardizing the weights in the convolutional layers, which we show is able to smooth the loss landscape by reducing the Lipschitz constants of the loss and the gradients. The effectiveness of WS is verified on many tasks, including image classification, object detection, instance segmentation, video recognition, semantic segmentation, and point cloud recognition. The code is available here: this https URL.

Authors: Siyuan Qiao, Huiyu Wang, Chenxi Liu, Wei Shen, Alan Yuille

Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher




Other Videos By Yannic Kilcher


2020-05-25Deep image reconstruction from human brain activity (Paper Explained)
2020-05-24Regularizing Trajectory Optimization with Denoising Autoencoders (Paper Explained)
2020-05-23[News] The NeurIPS Broader Impact Statement
2020-05-22When BERT Plays the Lottery, All Tickets Are Winning (Paper Explained)
2020-05-21[News] OpenAI Model Generates Python Code
2020-05-20Investigating Human Priors for Playing Video Games (Paper & Demo)
2020-05-19iMAML: Meta-Learning with Implicit Gradients (Paper Explained)
2020-05-18[Code] PyTorch sentiment classifier from scratch with Huggingface NLP Library (Full Tutorial)
2020-05-17Planning to Explore via Self-Supervised World Models (Paper Explained)
2020-05-16[News] Facebook's Real-Time TTS system runs on CPUs only!
2020-05-15Weight Standardization (Paper Explained)
2020-05-14[Trash] Automated Inference on Criminality using Face Images
2020-05-13Faster Neural Network Training with Data Echoing (Paper Explained)
2020-05-12Group Normalization (Paper Explained)
2020-05-11Concept Learning with Energy-Based Models (Paper Explained)
2020-05-10[News] Google’s medical AI was super accurate in a lab. Real life was a different story.
2020-05-09Big Transfer (BiT): General Visual Representation Learning (Paper Explained)
2020-05-08Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning (Paper Explained)
2020-05-07WHO ARE YOU? 10k Subscribers Special (w/ Channel Analytics)
2020-05-06Reinforcement Learning with Augmented Data (Paper Explained)
2020-05-05TAPAS: Weakly Supervised Table Parsing via Pre-training (Paper Explained)



Tags:
deep learning
machine learning
arxiv
explained
neural networks
ai
artificial intelligence
paper
normalize
batchnorm
groupnorm
layernorm
mean
center
std
standardize
backpropagation
convergence
gradients
norm
convolution
cnn
convolutional neural networks
filters
kernel
channel
architecture