Big Self-Supervised Models are Strong Semi-Supervised Learners (Paper Explained)

Channel:

Yannic Kilcher

Subscribers:

291,000

Published on June 20, 2020 2:03:33 PM ● Video Link: https://www.youtube.com/watch?v=2lkUNDZld-4

Duration: 37:31

30,826 views

1,210

This paper proposes SimCLRv2 and shows that semi-supervised learning benefits a lot from self-supervised pre-training. And stunningly, that effect gets larger the fewer labels are available and the more parameters the model has.

OUTLINE:
0:00 - Intro & Overview
1:40 - Semi-Supervised Learning
3:50 - Pre-Training via Self-Supervision
5:45 - Contrastive Loss
10:50 - Retaining Projection Heads
13:10 - Supervised Fine-Tuning
13:45 - Unsupervised Distillation & Self-Training
18:45 - Architecture Recap
22:25 - Experiments
34:15 - Broader Impact

Paper: https://arxiv.org/abs/2006.10029
Code: https://github.com/google-research/simclr

Abstract:
One paradigm for learning from few labeled examples while making best use of a large amount of unlabeled data is unsupervised pretraining followed by supervised fine-tuning. Although this paradigm uses unlabeled data in a task-agnostic way, in contrast to most previous approaches to semi-supervised learning for computer vision, we show that it is surprisingly effective for semi-supervised learning on ImageNet. A key ingredient of our approach is the use of a big (deep and wide) network during pretraining and fine-tuning. We find that, the fewer the labels, the more this approach (task-agnostic use of unlabeled data) benefits from a bigger network. After fine-tuning, the big network can be further improved and distilled into a much smaller one with little loss in classification accuracy by using the unlabeled examples for a second time, but in a task-specific way. The proposed semi-supervised learning algorithm can be summarized in three steps: unsupervised pretraining of a big ResNet model using SimCLRv2 (a modification of SimCLR), supervised fine-tuning on a few labeled examples, and distillation with unlabeled examples for refining and transferring the task-specific knowledge. This procedure achieves 73.9\% ImageNet top-1 accuracy with just 1\% of the labels (≤13 labeled images per class) using ResNet-50, a 10× improvement in label efficiency over the previous state-of-the-art. With 10\% of labels, ResNet-50 trained with our method achieves 77.5\% top-1 accuracy, outperforming standard supervised training with all of the labels.

Authors: Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, Geoffrey Hinton

Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher

Other Videos By Yannic Kilcher

2020-06-30	Object-Centric Learning with Slot Attention (Paper Explained)
2020-06-29	Set Distribution Networks: a Generative Model for Sets of Images (Paper Explained)
2020-06-28	Context R-CNN: Long Term Temporal Context for Per-Camera Object Detection (Paper Explained)
2020-06-27	Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures (Paper Explained)
2020-06-26	On the Measure of Intelligence by François Chollet - Part 3: The Math (Paper Explained)
2020-06-25	Discovering Symbolic Models from Deep Learning with Inductive Biases (Paper Explained)
2020-06-24	How I Read a Paper: Facebook's DETR (Video Tutorial)
2020-06-23	RepNet: Counting Out Time - Class Agnostic Video Repetition Counting in the Wild (Paper Explained)
2020-06-22	[Drama] Yann LeCun against Twitter on Dataset Bias
2020-06-21	SIREN: Implicit Neural Representations with Periodic Activation Functions (Paper Explained)
2020-06-20	Big Self-Supervised Models are Strong Semi-Supervised Learners (Paper Explained)
2020-06-19	On the Measure of Intelligence by François Chollet - Part 2: Human Priors (Paper Explained)
2020-06-18	Image GPT: Generative Pretraining from Pixels (Paper Explained)
2020-06-17	BYOL: Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning (Paper Explained)
2020-06-16	TUNIT: Rethinking the Truly Unsupervised Image-to-Image Translation (Paper Explained)
2020-06-15	A bio-inspired bistable recurrent cell allows for long-lasting memory (Paper Explained)
2020-06-14	SynFlow: Pruning neural networks without any data by iteratively conserving synaptic flow
2020-06-13	Deep Differential System Stability - Learning advanced computations from examples (Paper Explained)
2020-06-12	VirTex: Learning Visual Representations from Textual Annotations (Paper Explained)
2020-06-11	Linformer: Self-Attention with Linear Complexity (Paper Explained)
2020-06-10	End-to-End Adversarial Text-to-Speech (Paper Explained)

Tags:

deep learning

machine learning

arxiv

explained

neural networks

artificial intelligence

paper

cnn

resnet

simclr

simclr2

simclrv2

simclr v2

hinton

geoff

brain

wide

deep

convolutional

convolutions

self-supervised

contrastive

moco

momentum

projection

semi-supervised

unsupervised

distillation

teacher

student

Channel	Latest
Chroma	8 hours ago
Unnie Cj	8 hours ago
Brecy	9 hours ago
Renzuwu	9 hours ago
Fal Oval	9 hours ago
fadd game	9 hours ago
Aezwozere	9 hours ago
눈사람	9 hours ago
Fragilistic	9 hours ago
akitokid 青色夜想曲	9 hours ago
soydianagames	9 hours ago
상상상상	9 hours ago
Lucivius	9 hours ago
Ruckquez Nd Stuff	9 hours ago
野武士ノディー	10 hours ago
fan komar	10 hours ago
Tiago Vanz	10 hours ago
Reap	10 hours ago
ありなみパイセン	10 hours ago
69SportTV	10 hours ago
CHINGLAI HUNTER	10 hours ago
잡기사	10 hours ago
El Canal de JONHEEP	10 hours ago
SAEROS ID	10 hours ago
Sharan K.E	10 hours ago