VirTex: Learning Visual Representations from Textual Annotations (Paper Explained)

Channel:

Yannic Kilcher

Subscribers:

301,000

Published on June 12, 2020 5:28:15 PM ● Video Link: https://www.youtube.com/watch?v=ZfDZRX3WiJg

Duration: 29:42

5,916 views

267

Pre-training a CNN backbone for visual transfer learning has recently seen a big push into the direction of incorporating more data, at the cost of less supervision. This paper investigates the opposite: Visual transfer learning by pre-training from very few, but very high-quality samples on an image captioning task.

OUTLINE:
0:00 - Intro & Overview
1:00 - Pre-Training for Visual Tasks
3:40 - Quality-Quantity Tradeoff
5:50 - Image Captioning
8:35 - VirTex Method
14:30 - Linear Classification
20:30 - Ablations
22:05 - Fine-Tuning
25:45 - Attention Visualization
27:30 - Conclusion & Remarks

Paper: https://arxiv.org/abs/2006.06666
Code: https://github.com/kdexd/virtex

Abstract:
The de-facto approach to many vision tasks is to start from pretrained visual representations, typically learned via supervised training on ImageNet. Recent methods have explored unsupervised pretraining to scale to vast quantities of unlabeled images. In contrast, we aim to learn high-quality visual representations from fewer images. To this end, we revisit supervised pretraining, and seek data-efficient alternatives to classification-based pretraining. We propose VirTex -- a pretraining approach using semantically dense captions to learn visual representations. We train convolutional networks from scratch on COCO Captions, and transfer them to downstream recognition tasks including image classification, object detection, and instance segmentation. On all tasks, VirTex yields features that match or exceed those learned on ImageNet -- supervised or unsupervised -- despite using up to ten times fewer images.

Authors: Karan Desai, Justin Johnson

Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher

Other Videos By Yannic Kilcher

2020-06-22	[Drama] Yann LeCun against Twitter on Dataset Bias
2020-06-21	SIREN: Implicit Neural Representations with Periodic Activation Functions (Paper Explained)
2020-06-20	Big Self-Supervised Models are Strong Semi-Supervised Learners (Paper Explained)
2020-06-19	On the Measure of Intelligence by François Chollet - Part 2: Human Priors (Paper Explained)
2020-06-18	Image GPT: Generative Pretraining from Pixels (Paper Explained)
2020-06-17	BYOL: Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning (Paper Explained)
2020-06-16	TUNIT: Rethinking the Truly Unsupervised Image-to-Image Translation (Paper Explained)
2020-06-15	A bio-inspired bistable recurrent cell allows for long-lasting memory (Paper Explained)
2020-06-14	SynFlow: Pruning neural networks without any data by iteratively conserving synaptic flow
2020-06-13	Deep Differential System Stability - Learning advanced computations from examples (Paper Explained)
2020-06-12	VirTex: Learning Visual Representations from Textual Annotations (Paper Explained)
2020-06-11	Linformer: Self-Attention with Linear Complexity (Paper Explained)
2020-06-10	End-to-End Adversarial Text-to-Speech (Paper Explained)
2020-06-09	TransCoder: Unsupervised Translation of Programming Languages (Paper Explained)
2020-06-08	JOIN ME for the NeurIPS 2020 Flatland Multi-Agent RL Challenge!
2020-06-07	BLEURT: Learning Robust Metrics for Text Generation (Paper Explained)
2020-06-06	Synthetic Petri Dish: A Novel Surrogate Model for Rapid Architecture Search (Paper Explained)
2020-06-05	CornerNet: Detecting Objects as Paired Keypoints (Paper Explained)
2020-06-04	Movement Pruning: Adaptive Sparsity by Fine-Tuning (Paper Explained)
2020-06-03	Learning To Classify Images Without Labels (Paper Explained)
2020-06-02	On the Measure of Intelligence by François Chollet - Part 1: Foundations (Paper Explained)

Tags:

deep learning

machine learning

arxiv

explained

neural networks

artificial intelligence

paper

cnn

visual

resnet

caption

nlp

transformer

vasvani

attention

text

coco

imagenet

convolutional neural network

adaptation

transfer learning

quality

unsupervised

self-supervised

Channel	Latest
Family Friendly Gaming	7 hours ago
Gekisaka Game Channel	8 hours ago
Tello Godox	8 hours ago
Yannex	8 hours ago
100% WALKTHROUGH	8 hours ago
𝐌𝐢𝐧𝐝 𝐎𝐯𝐞𝐫 𝐨𝐟𝐟𝐢𝐜𝐢𝐚𝐥	8 hours ago
Limp CK	8 hours ago
Ur shivam	8 hours ago
Rayan Al-eissa	8 hours ago
GwammTM	8 hours ago
UNIQUE M79	8 hours ago
vasanth தமிழ் gaming	8 hours ago
Power Art YT	8 hours ago
Neon Gaming ID	8 hours ago
HOSTTLER 2.0	9 hours ago
officialgtvid	9 hours ago
Malayeka VT	9 hours ago
ឪអាទុយ	9 hours ago
Rusher Nitesh	9 hours ago
Misty Kathrine	9 hours ago
TEODORO	9 hours ago
かもへっぽこ	9 hours ago
BoBo Bro	9 hours ago
sen 2424	9 hours ago
Мысля Геймится	9 hours ago