Synchronized Audio-Visual Generation with a Joint Generative Diffusion Model and Contrastive Loss

Channel:

Subscribers:

351,000

Published on November 10, 2023 2:43:12 PM ● Video Link: https://www.youtube.com/watch?v=fpWH0JZJvsU

Duration: 46:11

641 views

Speakers: Ruihan Yang
Host: Sebastian Braun

The rapid development of deep learning techniques has led to significant advancements in the fields of multimedia generation and synthesis. However, generating coherent and temporally aligned audio and video content remains a challenging task due to the complex relationships between visual and auditory information. In this work, we propose a joint generative diffusion model that addresses this challenge by simultaneously generating video and audio content, thus enabling better synchronization and temporal alignment. Our approach is based on guided sampling, which allows for more flexibility in conditional generation and improves the overall quality of the generated content. Furthermore, we introduce a joint contrastive loss, inspired by previous work that has successfully employed contrastive loss in conditional diffusion models. By incorporating this joint contrastive loss, our model achieves better performance in terms of quality and temporal alignment. Through extensive evaluations using both subjective and objective metrics, we demonstrate the effectiveness of our proposed joint generative diffusion model in generating high-quality, temporally aligned audio and video content.

Learn more: https://www.microsoft.com/en-us/research/video/synchronized-audio-visual-generation-with-a-joint-generative-diffusion-model-and-contrastive-loss/

Other Videos By Microsoft Research

2023-12-05	AI Forum 2023 \| Towards Responsible AI Deployment
2023-12-05	AI Forum 2023 \| AI4Science: Accelerating Scientific Discovery with Artificial Intelligence
2023-12-05	AI Forum 2023 \| Harnessing AI for a Greener Tomorrow
2023-12-05	AI Forum 2023 \| Panel Discussion “AI Synergy: Science and Society”
2023-12-05	AI Forum 2023 \| Future of Foundation Models
2023-11-30	PwR: Using representations for AI-powered software development
2023-11-10	Binaural spatial audio positioning in video calls
2023-11-10	Semi-supervised Multi-task learning for acoustic parameter estimation
2023-11-10	Research intern talk: Real-time single-channel speech separation in noisy & reverberant environments
2023-11-10	Research intern talk: Unified speech enhancement approach for speech degradation & noise suppression
2023-11-10	Synchronized Audio-Visual Generation with a Joint Generative Diffusion Model and Contrastive Loss
2023-11-09	Supporting the Responsible AI Red-Teaming Human Infrastructure \| Jina Suh
2023-11-08	Project Mosaic
2023-11-02	Supporting the Responsible AI Red-Teaming Human Infrastructure \| Jina Suh
2023-11-02	Sociotechnical Approaches to Measuring Harms Caused by AI Systems \| Hanna Wallach
2023-11-02	Storytelling and futurism \| Matt Corwine
2023-11-02	Regulatory Innovation to Enable Use of Generative AI in Drug Development \| Stephanie Simmons
2023-11-02	AI Powered Community Micro-Grid for Resiliency and Equitability \| Peeyush Kumar
2023-11-02	Generative AI & Plural Governance: Mitigating Challenges & Surfacing Opportunities \| Madeleine Daepp
2023-11-02	AI in Organizational Settings \| danah boyd
2023-11-02	Announcing New Microsoft Research AI & Society Fellows program

Channel	Latest
RobtheMod	7 hours ago
MadMorph	8 hours ago
Rantoni	8 hours ago
elrubiusOMG	8 hours ago
gameranx	8 hours ago
Markiplier	9 hours ago
WolfeyVGC	9 hours ago
Mr DeKart	9 hours ago
I Dream of Indie Games	10 hours ago
Fire Within Us	10 hours ago
Family Friendly Gaming	10 hours ago
JL Tomy - Live	10 hours ago
3p Venom	10 hours ago
MumboElite	10 hours ago
Yannex	10 hours ago
Six9 FF	10 hours ago
RkReddy	10 hours ago
SammyJam	10 hours ago
Aniket shivalkar	10 hours ago
Hero Wars Central	10 hours ago
Kinotechka	10 hours ago
Dav1	10 hours ago
obiiWan7	10 hours ago
Papai Toons	10 hours ago
Ritmo Cabarete Digital TV	10 hours ago