GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

Channel:

Yannic Kilcher

Subscribers:

300,000

Published on December 28, 2021 10:31:56 PM ● Video Link: https://www.youtube.com/watch?v=gwI6g1pBD84

Duration: 0:00

37,292 views

902

#glide #openai #diffusion

Diffusion models learn to iteratively reverse a noising process that is applied repeatedly during training. The result can be used for conditional generation as well as various other tasks such as inpainting. OpenAI's GLIDE builds on recent advances in diffusion models and combines text-conditional diffusion with classifier-free guidance and upsampling to achieve unprecedented quality in text-to-image samples.

Try it yourself: https://huggingface.co/spaces/valhalla/glide-text2im

OUTLINE:
0:00 - Intro & Overview
6:10 - What is a Diffusion Model?
18:20 - Conditional Generation and Guided Diffusion
31:30 - Architecture Recap
34:05 - Training & Result metrics
36:55 - Failure cases & my own results
39:45 - Safety considerations

Paper: https://arxiv.org/abs/2112.10741
Code & Model: https://github.com/openai/glide-text2im

More diffusion papers:
https://arxiv.org/pdf/2006.11239.pdf
https://arxiv.org/pdf/2102.09672.pdf

Abstract:
Diffusion models have recently been shown to generate high-quality synthetic images, especially when paired with a guidance technique to trade off diversity for fidelity. We explore diffusion models for the problem of text-conditional image synthesis and compare two different guidance strategies: CLIP guidance and classifier-free guidance. We find that the latter is preferred by human evaluators for both photorealism and caption similarity, and often produces photorealistic samples. Samples from a 3.5 billion parameter text-conditional diffusion model using classifier-free guidance are favored by human evaluators to those from DALL-E, even when the latter uses expensive CLIP reranking. Additionally, we find that our models can be fine-tuned to perform image inpainting, enabling powerful text-driven image editing. We train a smaller model on a filtered dataset and release the code and weights at this https URL.

Authors: Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, Mark Chen

Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube:
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
LinkedIn: https://www.linkedin.com/in/ykilcher
BiliBili: https://space.bilibili.com/2017636191

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Other Videos By Yannic Kilcher

2022-02-04	GPT-NeoX-20B - Open-Source huge language model by EleutherAI (Interview w/ co-founder Connor Leahy)
2022-01-29	Predicting the rules behind - Deep Symbolic Regression for Recurrent Sequences (w/ author interview)
2022-01-27	IT ARRIVED! YouTube sent me a package. (also: Limited Time Merch Deal)
2022-01-25	[ML News] ConvNeXt: Convolutions return \| China regulates algorithms \| Saliency cropping examined
2022-01-21	Dynamic Inference with Neural Interpreters (w/ author interview)
2022-01-19	Noether Networks: Meta-Learning Useful Conserved Quantities (w/ the authors)
2022-01-11	This Team won the Minecraft RL BASALT Challenge! (Paper Explanation & Interview with the authors)
2022-01-05	Full Self-Driving is HARD! Analyzing Elon Musk re: Tesla Autopilot on Lex Fridman's Podcast
2022-01-02	Player of Games: All the games, one algorithm! (w/ author Martin Schmid)
2021-12-30	ML News Live! (Dec 30, 2021) Anonymous user RIPS Tensorflw \| AI prosecutors rising \| Penny Challenge
2021-12-28	GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
2021-12-27	Machine Learning Holidays Live Stream
2021-12-26	Machine Learning Holiday Live Stream
2021-12-24	[ML News] AI learns to search the Internet \| Drawings come to life \| New ML journal launches
2021-12-21	[ML News] DeepMind builds Gopher \| Google builds GLaM \| Suicide capsule uses AI to check access
2021-11-27	Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions (Paper Explained)
2021-11-25	Peer Review is still BROKEN! The NeurIPS 2021 Review Experiment (results are in)
2021-11-24	Parameter Prediction for Unseen Deep Architectures (w/ First Author Boris Knyazev)
2021-11-20	Learning Rate Grafting: Transferability of Optimizer Tuning (Machine Learning Research Paper Review)
2021-11-18	[ML News] Cedille French Language Model \| YOU Search Engine \| AI Finds Profitable MEME TOKENS
2021-11-15	Gradients are Not All You Need (Machine Learning Research Paper Explained)

Channel	Latest
Mushfiq Gaming	6 hours ago
Soleh Kadarisman	6 hours ago
K2uuu	6 hours ago
Huntes	6 hours ago
Riderbhai800	6 hours ago
RITIK FF	6 hours ago
Balsinito GT	6 hours ago
Piter Loz	7 hours ago
CONQUEROR Gamers	7 hours ago
nubek	7 hours ago
Mone ie	7 hours ago
QGU	7 hours ago
SUPERTSUKI	7 hours ago
Rafi Technic	7 hours ago
SIRIUS GAMING	7 hours ago
Magic_Clipz	7 hours ago
Jolan	7 hours ago
六神说漫	7 hours ago
Kotenarok	7 hours ago
MutedGiant4126	7 hours ago
Dailuna	7 hours ago
Elykhull	7 hours ago
Leo .D Gaming	7 hours ago
EXcalibur	7 hours ago
Awesome Gaming Land	7 hours ago