Lumiere: A Space-Time Diffusion Model for Video Generation (Paper Explained)

Channel:

Yannic Kilcher

Subscribers:

292,000

Published on February 4, 2024 4:17:48 PM ● Video Link: https://www.youtube.com/watch?v=Pl8BET_K1mc

Duration: 54:24

26,797 views

680

#lumiere #texttovideoai #google

LUMIERE by Google Research tackles globally consistent text-to-video generation by extending the U-Net downsampling concept to the temporal axis of videos.

OUTLINE:
0:00 - Introduction
8:20 - Problems with keyframes
16:55 - Space-Time U-Net (STUNet)
21:20 - Extending U-Nets to video
37:20 - Multidiffusion for SSR prediction fusing
44:00 - Stylized generation by swapping weights
49:15 - Training & Evaluation
53:20 - Societal Impact & Conclusion

Paper: https://arxiv.org/abs/2401.12945
Website: https://lumiere-video.github.io/

Abstract:
We introduce Lumiere -- a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion -- a pivotal challenge in video synthesis. To this end, we introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once, through a single pass in the model. This is in contrast to existing video models which synthesize distant keyframes followed by temporal super-resolution -- an approach that inherently makes global temporal consistency difficult to achieve. By deploying both spatial and (importantly) temporal down- and up-sampling and leveraging a pre-trained text-to-image diffusion model, our model learns to directly generate a full-frame-rate, low-resolution video by processing it in multiple space-time scales. We demonstrate state-of-the-art text-to-video generation results, and show that our design easily facilitates a wide range of content creation tasks and video editing applications, including image-to-video, video inpainting, and stylized generation.

Authors: Omer Bar-Tal, Hila Chefer, Omer Tov, Charles Herrmann, Roni Paiss, Shiran Zada, Ariel Ephrat, Junhwa Hur, Yuanzhen Li, Tomer Michaeli, Oliver Wang, Deqing Sun, Tali Dekel, Inbar Mosseri

Links:
Homepage: https://ykilcher.com
Merch: https://ykilcher.com/merch
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://ykilcher.com/discord
LinkedIn: https://www.linkedin.com/in/ykilcher

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Other Videos By Yannic Kilcher

2024-04-06	Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping (Searchformer)
2024-03-26	[ML News] Grok-1 open-sourced \| Nvidia GTC \| OpenAI leaks model names \| AI Act
2024-03-17	[ML News] Devin AI Software Engineer \| GPT-4.5-Turbo LEAKED \| US Gov't Report: Total Extinction
2024-03-10	[ML News] Elon sues OpenAI \| Mistral Large \| More Gemini Drama
2024-03-07	On Claude 3
2024-03-05	No, Anthropic's Claude 3 is NOT sentient
2024-03-01	[ML News] Groq, Gemma, Sora, Gemini, and Air Canada's chatbot troubles
2024-02-22	Gemini has a Diversity Problem
2024-02-19	V-JEPA: Revisiting Feature Prediction for Learning Visual Representations from Video (Explained)
2024-02-18	What a day in AI! (Sora, Gemini 1.5, V-JEPA, and lots of news)
2024-02-04	Lumiere: A Space-Time Diffusion Model for Video Generation (Paper Explained)
2024-01-21	AlphaGeometry: Solving olympiad geometry without human demonstrations (Paper Explained)
2024-01-13	Mixtral of Experts (Paper Explained)
2024-01-10	Until the Litter End
2024-01-07	LLaMA Pro: Progressive LLaMA with Block Expansion (Paper Explained)
2024-01-02	I created an AI-powered Social Network
2023-12-26	NeurIPS 2023 Poster Session 4 (Thursday Morning)
2023-12-25	Traditional X-Mas Stream
2023-12-25	Art @ NeurIPS 2023
2023-12-24	Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Paper Explained)
2023-12-23	Another Hit Piece on Open-Source AI

Tags:

deep learning

machine learning

arxiv

explained

neural networks

artificial intelligence

paper

Channel	Latest
VIDET ID	6 hours ago
Jhoy d Explorer	6 hours ago
O2 Official	7 hours ago
Angel-De La-Verdad	7 hours ago
THE RAFCAVE	7 hours ago
Glint	7 hours ago
Panda King	7 hours ago
Hands ON	7 hours ago
Rheza Smail	7 hours ago
DragonsAfterDark	7 hours ago
BOsSEOFFiCiAL	8 hours ago
Barzêl Gameplay	8 hours ago
Enka Familly TubeHD	8 hours ago
Sameer Gaming	8 hours ago
SMART MACHI GAMING	8 hours ago
ThePlatinumVault	8 hours ago
MAUT VIRUS GAMING	8 hours ago
Bornlesszero	8 hours ago
Raffa Fustagno	8 hours ago
GD REACTS BONUS	8 hours ago
Anari Queen Gaming	8 hours ago
PeterTheEvilTediz	8 hours ago
Heavens Ricks	8 hours ago
Lukwer TFT	8 hours ago
OnGame MultiGaming	8 hours ago