Image GPT: Generative Pretraining from Pixels (Paper Explained)

Subscribers:
284,000
Published on ● Video Link: https://www.youtube.com/watch?v=YBlNQK0Ao6g



Duration: 31:47
28,664 views
938


BERT and GPT-2/3 have shown the enormous power of using generative models as pre-training for classification tasks. However, for images, pre-training is usually done with supervised or self-supervised objectives. This paper investigates how far you can get when applying the principles from the world of NLP to the world of images.

OUTLINE:
0:00 - Intro & Overview
2:50 - Generative Models for Pretraining
4:50 - Pretraining for Visual Tasks
7:40 - Model Architecture
15:15 - Linear Probe Experiments
24:15 - Fine-Tuning Experiments
30:25 - Conclusion & Comments

Paper:
https://cdn.openai.com/papers/Generative_Pretraining_from_Pixels_V2.pdf
Blog: https://openai.com/blog/image-gpt/
Code: https://github.com/openai/image-gpt

Abstract:
Inspired by progress in unsupervised representation learning for natural language, we examine whether similar models can learn useful representations for images. We train a sequence Transformer to auto-regressively predict pixels, without incorporating knowledge of the 2D input structure. Despite training on low-resolution ImageNet without labels, we find that a GPT-2 scale model learns strong image representations as measured by linear probing, fine-tuning, and low-data classification. On CIFAR-10, we achieve 96.3% accuracy with a linear probe, outperforming a supervised Wide ResNet, and 99.0% accuracy with full finetuning, matching the top supervised pre-trained models. An even larger model trained on a mixture of ImageNet and web images is competitive with self-supervised benchmarks on ImageNet, achieving 72.0% top-1 accuracy on a linear probe of our features.

Authors: Mark Chen, Alec Radford, Rewon Child, Jeff Wu, Heewoo Jun, Prafulla Dhariwal, David Luan, Ilya Sutskever

Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher




Other Videos By Yannic Kilcher


2020-06-28Context R-CNN: Long Term Temporal Context for Per-Camera Object Detection (Paper Explained)
2020-06-27Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures (Paper Explained)
2020-06-26On the Measure of Intelligence by François Chollet - Part 3: The Math (Paper Explained)
2020-06-25Discovering Symbolic Models from Deep Learning with Inductive Biases (Paper Explained)
2020-06-24How I Read a Paper: Facebook's DETR (Video Tutorial)
2020-06-23RepNet: Counting Out Time - Class Agnostic Video Repetition Counting in the Wild (Paper Explained)
2020-06-22[Drama] Yann LeCun against Twitter on Dataset Bias
2020-06-21SIREN: Implicit Neural Representations with Periodic Activation Functions (Paper Explained)
2020-06-20Big Self-Supervised Models are Strong Semi-Supervised Learners (Paper Explained)
2020-06-19On the Measure of Intelligence by François Chollet - Part 2: Human Priors (Paper Explained)
2020-06-18Image GPT: Generative Pretraining from Pixels (Paper Explained)
2020-06-17BYOL: Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning (Paper Explained)
2020-06-16TUNIT: Rethinking the Truly Unsupervised Image-to-Image Translation (Paper Explained)
2020-06-15A bio-inspired bistable recurrent cell allows for long-lasting memory (Paper Explained)
2020-06-14SynFlow: Pruning neural networks without any data by iteratively conserving synaptic flow
2020-06-13Deep Differential System Stability - Learning advanced computations from examples (Paper Explained)
2020-06-12VirTex: Learning Visual Representations from Textual Annotations (Paper Explained)
2020-06-11Linformer: Self-Attention with Linear Complexity (Paper Explained)
2020-06-10End-to-End Adversarial Text-to-Speech (Paper Explained)
2020-06-09TransCoder: Unsupervised Translation of Programming Languages (Paper Explained)
2020-06-08JOIN ME for the NeurIPS 2020 Flatland Multi-Agent RL Challenge!



Tags:
deep learning
machine learning
arxiv
explained
neural networks
ai
artificial intelligence
paper
openai
gpt2
gpt3
bert
transformer
attention is all you need
attention mechanism
multi-head attention
pixel rnn
pixel cnn
pretraining
representation
linear probe
fine-tuning
cifar10
cifar100
imagenet
cnn
convolutional neural network
autoregressive