Image GPT: Generative Pretraining from Pixels (Paper Explained)

Channel:

Yannic Kilcher

Subscribers:

301,000

Published on June 18, 2020 1:24:24 PM ● Video Link: https://www.youtube.com/watch?v=YBlNQK0Ao6g

Duration: 31:47

28,664 views

938

BERT and GPT-2/3 have shown the enormous power of using generative models as pre-training for classification tasks. However, for images, pre-training is usually done with supervised or self-supervised objectives. This paper investigates how far you can get when applying the principles from the world of NLP to the world of images.

OUTLINE:
0:00 - Intro & Overview
2:50 - Generative Models for Pretraining
4:50 - Pretraining for Visual Tasks
7:40 - Model Architecture
15:15 - Linear Probe Experiments
24:15 - Fine-Tuning Experiments
30:25 - Conclusion & Comments

Paper:
https://cdn.openai.com/papers/Generative_Pretraining_from_Pixels_V2.pdf
Blog: https://openai.com/blog/image-gpt/
Code: https://github.com/openai/image-gpt

Abstract:
Inspired by progress in unsupervised representation learning for natural language, we examine whether similar models can learn useful representations for images. We train a sequence Transformer to auto-regressively predict pixels, without incorporating knowledge of the 2D input structure. Despite training on low-resolution ImageNet without labels, we find that a GPT-2 scale model learns strong image representations as measured by linear probing, fine-tuning, and low-data classification. On CIFAR-10, we achieve 96.3% accuracy with a linear probe, outperforming a supervised Wide ResNet, and 99.0% accuracy with full finetuning, matching the top supervised pre-trained models. An even larger model trained on a mixture of ImageNet and web images is competitive with self-supervised benchmarks on ImageNet, achieving 72.0% top-1 accuracy on a linear probe of our features.

Authors: Mark Chen, Alec Radford, Rewon Child, Jeff Wu, Heewoo Jun, Prafulla Dhariwal, David Luan, Ilya Sutskever

Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher

Other Videos By Yannic Kilcher

2020-06-28	Context R-CNN: Long Term Temporal Context for Per-Camera Object Detection (Paper Explained)
2020-06-27	Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures (Paper Explained)
2020-06-26	On the Measure of Intelligence by François Chollet - Part 3: The Math (Paper Explained)
2020-06-25	Discovering Symbolic Models from Deep Learning with Inductive Biases (Paper Explained)
2020-06-24	How I Read a Paper: Facebook's DETR (Video Tutorial)
2020-06-23	RepNet: Counting Out Time - Class Agnostic Video Repetition Counting in the Wild (Paper Explained)
2020-06-22	[Drama] Yann LeCun against Twitter on Dataset Bias
2020-06-21	SIREN: Implicit Neural Representations with Periodic Activation Functions (Paper Explained)
2020-06-20	Big Self-Supervised Models are Strong Semi-Supervised Learners (Paper Explained)
2020-06-19	On the Measure of Intelligence by François Chollet - Part 2: Human Priors (Paper Explained)
2020-06-18	Image GPT: Generative Pretraining from Pixels (Paper Explained)
2020-06-17	BYOL: Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning (Paper Explained)
2020-06-16	TUNIT: Rethinking the Truly Unsupervised Image-to-Image Translation (Paper Explained)
2020-06-15	A bio-inspired bistable recurrent cell allows for long-lasting memory (Paper Explained)
2020-06-14	SynFlow: Pruning neural networks without any data by iteratively conserving synaptic flow
2020-06-13	Deep Differential System Stability - Learning advanced computations from examples (Paper Explained)
2020-06-12	VirTex: Learning Visual Representations from Textual Annotations (Paper Explained)
2020-06-11	Linformer: Self-Attention with Linear Complexity (Paper Explained)
2020-06-10	End-to-End Adversarial Text-to-Speech (Paper Explained)
2020-06-09	TransCoder: Unsupervised Translation of Programming Languages (Paper Explained)
2020-06-08	JOIN ME for the NeurIPS 2020 Flatland Multi-Agent RL Challenge!

Tags:

deep learning

machine learning

arxiv

explained

neural networks

artificial intelligence

paper

openai

gpt2

gpt3

bert

transformer

attention is all you need

attention mechanism

multi-head attention

pixel rnn

pixel cnn

pretraining

representation

linear probe

fine-tuning

cifar10

cifar100

imagenet

cnn

convolutional neural network

autoregressive

Channel	Latest
Rivas	6 hours ago
Wrads Games	6 hours ago
Joshua And Friends	6 hours ago
Kaapomies2K	6 hours ago
Juliana Pantera FF	6 hours ago
neeaclyne	6 hours ago
GatoPretoGames	6 hours ago
Walter Morningstar	6 hours ago
MidnightViperGames	6 hours ago
Rebeccas Creations	6 hours ago
Feldraxs's	7 hours ago
寿	7 hours ago
SAF FightingSloth	7 hours ago
Rubén Eguizábal	7 hours ago
Miiguelpin	7 hours ago
Suguru Geto	7 hours ago
TV ATITUDE	7 hours ago
CoraToons	7 hours ago
🥉동학개미공식채널	7 hours ago
Dongkal RangeR	7 hours ago
The Matthews Fam	7 hours ago
Waccau Gameplay	7 hours ago
Nukerz	7 hours ago
Vile Tempest \| TEMPEST ORDER	7 hours ago
Aperez - Twitch TV	7 hours ago