Synchronized Audio-Visual Generation with a Joint Generative Diffusion Model and Contrastive Loss

Subscribers:
351,000
Published on ● Video Link: https://www.youtube.com/watch?v=fpWH0JZJvsU



Duration: 46:11
641 views
0


Speakers: Ruihan Yang
Host: Sebastian Braun

The rapid development of deep learning techniques has led to significant advancements in the fields of multimedia generation and synthesis. However, generating coherent and temporally aligned audio and video content remains a challenging task due to the complex relationships between visual and auditory information. In this work, we propose a joint generative diffusion model that addresses this challenge by simultaneously generating video and audio content, thus enabling better synchronization and temporal alignment. Our approach is based on guided sampling, which allows for more flexibility in conditional generation and improves the overall quality of the generated content. Furthermore, we introduce a joint contrastive loss, inspired by previous work that has successfully employed contrastive loss in conditional diffusion models. By incorporating this joint contrastive loss, our model achieves better performance in terms of quality and temporal alignment. Through extensive evaluations using both subjective and objective metrics, we demonstrate the effectiveness of our proposed joint generative diffusion model in generating high-quality, temporally aligned audio and video content.

Learn more: https://www.microsoft.com/en-us/research/video/synchronized-audio-visual-generation-with-a-joint-generative-diffusion-model-and-contrastive-loss/




Other Videos By Microsoft Research


2023-12-05AI Forum 2023 | Towards Responsible AI Deployment
2023-12-05AI Forum 2023 | AI4Science: Accelerating Scientific Discovery with Artificial Intelligence
2023-12-05AI Forum 2023 | Harnessing AI for a Greener Tomorrow
2023-12-05AI Forum 2023 | Panel Discussion “AI Synergy: Science and Society”
2023-12-05AI Forum 2023 | Future of Foundation Models
2023-11-30PwR: Using representations for AI-powered software development
2023-11-10Binaural spatial audio positioning in video calls
2023-11-10Semi-supervised Multi-task learning for acoustic parameter estimation
2023-11-10Research intern talk: Real-time single-channel speech separation in noisy & reverberant environments
2023-11-10Research intern talk: Unified speech enhancement approach for speech degradation & noise suppression
2023-11-10Synchronized Audio-Visual Generation with a Joint Generative Diffusion Model and Contrastive Loss
2023-11-09Supporting the Responsible AI Red-Teaming Human Infrastructure | Jina Suh
2023-11-08Project Mosaic
2023-11-02Supporting the Responsible AI Red-Teaming Human Infrastructure | Jina Suh
2023-11-02Sociotechnical Approaches to Measuring Harms Caused by AI Systems | Hanna Wallach
2023-11-02Storytelling and futurism | Matt Corwine
2023-11-02Regulatory Innovation to Enable Use of Generative AI in Drug Development | Stephanie Simmons
2023-11-02AI Powered Community Micro-Grid for Resiliency and Equitability | Peeyush Kumar
2023-11-02Generative AI & Plural Governance: Mitigating Challenges & Surfacing Opportunities | Madeleine Daepp
2023-11-02AI in Organizational Settings | danah boyd
2023-11-02Announcing New Microsoft Research AI & Society Fellows program