Movement Pruning: Adaptive Sparsity by Fine-Tuning (Paper Explained)

Channel:

Yannic Kilcher

Subscribers:

291,000

Published on June 4, 2020 8:06:40 PM ● Video Link: https://www.youtube.com/watch?v=nxEr4VNgYOE

Duration: 30:11

4,514 views

191

Deep neural networks are large models and pruning has become an important part of ML product pipelines, making models small while keeping their performance high. However, the classic pruning method, Magnitude Pruning, is suboptimal in models that are obtained by transfer learning. This paper proposes a solution, called Movement Pruning and shows its superior performance.

OUTLINE:
0:00 - Intro & High-Level Overview
0:55 - Magnitude Pruning
4:25 - Transfer Learning
7:25 - The Problem with Magnitude Pruning in Transfer Learning
9:20 - Movement Pruning
22:20 - Experiments
24:20 - Improvements via Distillation
26:40 - Analysis of the Learned Weights

Paper: https://arxiv.org/abs/2005.07683
Code: https://github.com/huggingface/transformers/tree/master/examples/movement-pruning

Abstract:
Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning; however, it is less effective in the transfer learning regime that has become standard for state-of-the-art natural language processing applications. We propose the use of movement pruning, a simple, deterministic first-order weight pruning method that is more adaptive to pretrained model fine-tuning. We give mathematical foundations to the method and compare it to existing zeroth- and first-order pruning methods. Experiments show that when pruning large pretrained language models, movement pruning shows significant improvements in high-sparsity regimes. When combined with distillation, the approach achieves minimal accuracy loss with down to only 3% of the model parameters.

Authors: Victor Sanh, Thomas Wolf, Alexander M. Rush

Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher

Other Videos By Yannic Kilcher

2020-06-14	SynFlow: Pruning neural networks without any data by iteratively conserving synaptic flow
2020-06-13	Deep Differential System Stability - Learning advanced computations from examples (Paper Explained)
2020-06-12	VirTex: Learning Visual Representations from Textual Annotations (Paper Explained)
2020-06-11	Linformer: Self-Attention with Linear Complexity (Paper Explained)
2020-06-10	End-to-End Adversarial Text-to-Speech (Paper Explained)
2020-06-09	TransCoder: Unsupervised Translation of Programming Languages (Paper Explained)
2020-06-08	JOIN ME for the NeurIPS 2020 Flatland Multi-Agent RL Challenge!
2020-06-07	BLEURT: Learning Robust Metrics for Text Generation (Paper Explained)
2020-06-06	Synthetic Petri Dish: A Novel Surrogate Model for Rapid Architecture Search (Paper Explained)
2020-06-05	CornerNet: Detecting Objects as Paired Keypoints (Paper Explained)
2020-06-04	Movement Pruning: Adaptive Sparsity by Fine-Tuning (Paper Explained)
2020-06-03	Learning To Classify Images Without Labels (Paper Explained)
2020-06-02	On the Measure of Intelligence by François Chollet - Part 1: Foundations (Paper Explained)
2020-06-01	Dynamics-Aware Unsupervised Discovery of Skills (Paper Explained)
2020-05-31	Synthesizer: Rethinking Self-Attention in Transformer Models (Paper Explained)
2020-05-30	[Code] How to use Facebook's DETR object detection algorithm in Python (Full Tutorial)
2020-05-29	GPT-3: Language Models are Few-Shot Learners (Paper Explained)
2020-05-28	DETR: End-to-End Object Detection with Transformers (Paper Explained)
2020-05-27	mixup: Beyond Empirical Risk Minimization (Paper Explained)
2020-05-26	A critical analysis of self-supervision, or what we can learn from a single image (Paper Explained)
2020-05-25	Deep image reconstruction from human brain activity (Paper Explained)

Tags:

deep learning

machine learning

arxiv

explained

neural networks

artificial intelligence

paper

prune

pruning

transfer learning

weights

magnitude

gradient

moving

small

importance

huggingface

nlp

natural language processing

squad

mnli

bert

transformer

attention

cnn

distillation

teacher

sparse

sparsity

question answering

mobile

edge

tune

fine-tune

Channel	Latest
Skyprince777	8 hours ago
Tsubasa Yozora Ch.	8 hours ago
USIX Pro Gaming	8 hours ago
alanzoka	14 hours ago
AnimeToons	15 hours ago
Flik's Gaming Stuff	15 hours ago
Beyond the Brick	16 hours ago
Nintendo Life	19 hours ago
IntroGameOver	19 hours ago
Badaw Gaming	20 hours ago
lugeyps3	20 hours ago
CarbotAnimations	21 hours ago
Pixelorez	21 hours ago
Primal Koopa Pictures	21 hours ago
BeastBoyShub	22 hours ago
816	22 hours ago
Chroma	22 hours ago
Unnie Cj	23 hours ago
Brecy	23 hours ago
Renzuwu	23 hours ago
Fal Oval	23 hours ago
fadd game	23 hours ago
Aezwozere	23 hours ago
눈사람	23 hours ago
Fragilistic	23 hours ago