OpenAI DALL·E: Creating Images from Text (Blog Post Explained)

Channel:

Yannic Kilcher

Subscribers:

300,000

Published on January 6, 2021 1:49:24 PM ● Video Link: https://www.youtube.com/watch?v=j4xgkjWlfL4

Category:

Vlog

Duration: 55:46

102,036 views

2,261

#openai #science #gpt3

OpenAI's newest model, DALL·E, shows absolutely amazing abilities in generating high-quality images from arbitrary text descriptions. Like GPT-3, the range of applications and the diversity of outputs is astonishing, given that this is a single model, trained on a purely autoregressive task. This model is a significant step towards the combination of text and images in future AI applications.

OUTLINE:
0:00 - Introduction
2:45 - Overview
4:20 - Dataset
5:35 - Comparison to GPT-3
7:00 - Model Architecture
13:20 - VQ-VAE
21:00 - Combining VQ-VAE with GPT-3
27:30 - Pre-Training with Relaxation
32:15 - Experimental Results
33:00 - My Hypothesis about DALL·E's inner workings
36:15 - Sparse Attention Patterns
38:00 - DALL·E can't count
39:35 - DALL·E can't global order
40:10 - DALL·E renders different views
41:10 - DALL·E is very good at texture
41:40 - DALL·E can complete a bust
43:30 - DALL·E can do some reflections, but not others
44:15 - DALL·E can do cross-sections of some objects
45:50 - DALL·E is amazing at style
46:30 - DALL·E can generate logos
47:40 - DALL·E can generate bedrooms
48:35 - DALL·E can combine unusual concepts
49:25 - DALL·E can generate illustrations
50:15 - DALL·E sometimes understands complicated prompts
50:55 - DALL·E can pass part of an IQ test
51:40 - DALL·E probably does not have geographical / temporal knowledge
53:10 - Reranking dramatically improves quality
53:50 - Conclusions & Comments

Blog: https://openai.com/blog/dall-e/

Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher
LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Other Videos By Yannic Kilcher

2021-02-19	Dreamer v2: Mastering Atari with Discrete World Models (Machine Learning Research Paper Explained)
2021-02-17	TransGAN: Two Transformers Can Make One Strong GAN (Machine Learning Research Paper Explained)
2021-02-14	NFNets: High-Performance Large-Scale Image Recognition Without Normalization (ML Paper Explained)
2021-02-11	Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention (AI Paper Explained)
2021-02-04	Deep Networks Are Kernel Machines (Paper Explained)
2021-02-02	Feedback Transformers: Addressing Some Limitations of Transformers with Feedback Memory (Explained)
2021-01-29	SingularityNET - A Decentralized, Open Market and Network for AIs (Whitepaper Explained)
2021-01-22	Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
2021-01-17	STOCHASTIC MEME DESCENT - Deep Learning Meme Review - Episode 2 (Part 2 of 2)
2021-01-12	OpenAI CLIP: ConnectingText and Images (Paper Explained)
2021-01-06	OpenAI DALL·E: Creating Images from Text (Blog Post Explained)
2020-12-26	Extracting Training Data from Large Language Models (Paper Explained)
2020-12-24	MEMES IS ALL YOU NEED - Deep Learning Meme Review - Episode 2 (Part 1 of 2)
2020-12-16	ReBeL - Combining Deep Reinforcement Learning and Search for Imperfect-Information Games (Explained)
2020-12-13	2M All-In into $5 Pot! WWYD? Daniel Negreanu's No-Limit Hold'em Challenge! (Poker Hand Analysis)
2020-12-01	DeepMind's AlphaFold 2 Explained! AI Breakthrough in Protein Folding! What we know (& what we don't)
2020-11-29	Predictive Coding Approximates Backprop along Arbitrary Computation Graphs (Paper Explained)
2020-11-22	Fourier Neural Operator for Parametric Partial Differential Equations (Paper Explained)
2020-11-15	[News] Soccer AI FAILS and mixes up ball and referee's bald head.
2020-11-10	Underspecification Presents Challenges for Credibility in Modern Machine Learning (Paper Explained)
2020-11-02	Language Models are Open Knowledge Graphs (Paper Explained)

Tags:

deep learning

machine learning

arxiv

explained

neural networks

artificial intelligence

paper

gpt

gpt-3

visual transformer

transformer

transformers

attention mechanism

vqvae

vq vae

vq-vae

codebook

relaxation

gumbel

text

images

nlp

natural language processing

autoregressive

grid

encoder

decoder

gpt3

avocado chair

porcupine sphere

animations

fisheye

text to image

image captioning

openai

sutskever

dali

dalle

walle

vector quantized

hierarchical

gan

generative

likelihood

Channel	Latest
CONQUEROR Gamers	6 hours ago
LEO DESANDE E ANA CLÁUDIA	6 hours ago
DIRT REBEL RIDER	6 hours ago
Hawkeye Punisher Gaming	6 hours ago
wagkangano Gaming	6 hours ago
Siamsport	6 hours ago
🍄A random talking mushroom🍄	6 hours ago
POT Kits	6 hours ago
미즈몽가든	6 hours ago
De'Longhi Global	6 hours ago
RetroGameManDan79	6 hours ago
Abdo 2xd	6 hours ago
ᴅᴀxᴜ ꜱᴀʀᴋᴀʀ	6 hours ago
FF Tech King	6 hours ago
BogdanHDGaming RO	6 hours ago
Ghayal 09	7 hours ago
A TUTTO CALCIO⚽	7 hours ago
Ini Guru Budi	7 hours ago
メッス	7 hours ago
DESI CHHORA_YT	7 hours ago
JAY IS LIVE	7 hours ago
YUYUGAMES	7 hours ago
Narendra yt great	7 hours ago
Kaki Fly Malaya Official	7 hours ago
Dip it in Game Gamnoin	7 hours ago