When BERT Plays the Lottery, All Tickets Are Winning (Paper Explained)

Channel:

Yannic Kilcher

Subscribers:

291,000

Published on May 22, 2020 5:45:44 PM ● Video Link: https://www.youtube.com/watch?v=IIebBjbBevs

Category:

Let's Play

Duration: 53:35

29,581 views

308

BERT is a giant model. Turns out you can prune away many of its components and it still works. This paper analyzes BERT pruning in light of the Lottery Ticket Hypothesis and finds that even the "bad" lottery tickets can be fine-tuned to good accuracy.

OUTLINE:
0:00 - Overview
1:20 - BERT
3:20 - Lottery Ticket Hypothesis
13:00 - Paper Abstract
18:00 - Pruning BERT
23:00 - Experiments
50:00 - Conclusion

https://arxiv.org/abs/2005.00561

ML Street Talk Channel: https://www.youtube.com/channel/UCMLtBahI5DMrt0NPvDSoIRQ

Abstract:
Much of the recent success in NLP is due to the large Transformer-based models such as BERT (Devlin et al, 2019). However, these models have been shown to be reducible to a smaller number of self-attention heads and layers. We consider this phenomenon from the perspective of the lottery ticket hypothesis. For fine-tuned BERT, we show that (a) it is possible to find a subnetwork of elements that achieves performance comparable with that of the full model, and (b) similarly-sized subnetworks sampled from the rest of the model perform worse. However, the "bad" subnetworks can be fine-tuned separately to achieve only slightly worse performance than the "good" ones, indicating that most weights in the pre-trained BERT are potentially useful. We also show that the "good" subnetworks vary considerably across GLUE tasks, opening up the possibilities to learn what knowledge BERT actually uses at inference time.

Authors: Sai Prasanna, Anna Rogers, Anna Rumshisky

Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher

Other Videos By Yannic Kilcher

2020-06-01	Dynamics-Aware Unsupervised Discovery of Skills (Paper Explained)
2020-05-31	Synthesizer: Rethinking Self-Attention in Transformer Models (Paper Explained)
2020-05-30	[Code] How to use Facebook's DETR object detection algorithm in Python (Full Tutorial)
2020-05-29	GPT-3: Language Models are Few-Shot Learners (Paper Explained)
2020-05-28	DETR: End-to-End Object Detection with Transformers (Paper Explained)
2020-05-27	mixup: Beyond Empirical Risk Minimization (Paper Explained)
2020-05-26	A critical analysis of self-supervision, or what we can learn from a single image (Paper Explained)
2020-05-25	Deep image reconstruction from human brain activity (Paper Explained)
2020-05-24	Regularizing Trajectory Optimization with Denoising Autoencoders (Paper Explained)
2020-05-23	[News] The NeurIPS Broader Impact Statement
2020-05-22	When BERT Plays the Lottery, All Tickets Are Winning (Paper Explained)
2020-05-21	[News] OpenAI Model Generates Python Code
2020-05-20	Investigating Human Priors for Playing Video Games (Paper & Demo)
2020-05-19	iMAML: Meta-Learning with Implicit Gradients (Paper Explained)
2020-05-18	[Code] PyTorch sentiment classifier from scratch with Huggingface NLP Library (Full Tutorial)
2020-05-17	Planning to Explore via Self-Supervised World Models (Paper Explained)
2020-05-16	[News] Facebook's Real-Time TTS system runs on CPUs only!
2020-05-15	Weight Standardization (Paper Explained)
2020-05-14	[Trash] Automated Inference on Criminality using Face Images
2020-05-13	Faster Neural Network Training with Data Echoing (Paper Explained)
2020-05-12	Group Normalization (Paper Explained)

Tags:

deep learning

machine learning

arxiv

explained

neural networks

artificial intelligence

paper

bert

nlp

lottery ticket

good

bad

winning

pruning

weights

attention

transformer

heads

multi-head

fine-tuning

glue

benchmark

Channel	Latest
alanzoka	10 hours ago
Beyond the Brick	12 hours ago
Nintendo Life	15 hours ago
IntroGameOver	15 hours ago
lugeyps3	16 hours ago
CarbotAnimations	17 hours ago
Pixelorez	17 hours ago
Primal Koopa Pictures	17 hours ago
BeastBoyShub	17 hours ago
Chroma	18 hours ago
Unnie Cj	18 hours ago
Brecy	19 hours ago
Renzuwu	19 hours ago
Fal Oval	19 hours ago
fadd game	19 hours ago
Aezwozere	19 hours ago
눈사람	19 hours ago
Fragilistic	19 hours ago
akitokid 青色夜想曲	19 hours ago
soydianagames	19 hours ago
상상상상	19 hours ago
Lucivius	19 hours ago
Ruckquez Nd Stuff	19 hours ago
野武士ノディー	19 hours ago
fan komar	19 hours ago