Research intern talk: Real-time single-channel speech separation in noisy & reverberant environments

Subscribers:
343,000
Published on ● Video Link: https://www.youtube.com/watch?v=-X2mdbmKEM8



Duration: 53:29
483 views
0


Speakers: Julian Neri
Host: Sebastian Braun

Real-time single-channel speech separation aims to unmix an audio stream captured from a single microphone that contains multiple people talking at once, environmental noise, and reverberation into multiple de-reverberated and noise-free speech tracks, each track containing only one talker. While large state-of-the-art DNNs can achieve excellent separation from anechoic mixtures of speech, the main challenge is to create compact and causal models that can separate reverberant mixtures at inference time. In this research project, we explore low-complexity, resource-efficient, causal DNN architectures for real-time separation of two or more simultaneous speakers. A cascade of three CRUSE models were trained to sequentially perform noise-suppression, separation, and de-reverberation. For comparison, a larger end-to-end CRUSE model was trained to output two anechoic speech signals directly from noisy reverberant speech mixtures. We propose an efficient single-decoder architecture with “best-and-rest†training for real-time recursive speech separation or two or more speakers. Evaluations on WHAMR! and real monophonic recordings of speech mixtures from REAL-M and DNS challenge datasets, according to speech separation and perceptual measures like SI-SDR and DNS-MOS, show that these compact causal models can separate speech mixtures with low latency.

Learn more: https://www.microsoft.com/en-us/research/video/research-intern-talk-real-time-single-channel-speech-separation-in-noisy-reverberant-environments/




Other Videos By Microsoft Research


2023-12-05AI Forum 2023 | Bridging Disciplines: Exploring the Frontiers of New Computing Paradigms
2023-12-05AI Forum 2023 | Innovating Intelligent Environments for Wireless Communication & Sensing
2023-12-05AI Forum 2023 | Towards Responsible AI Deployment
2023-12-05AI Forum 2023 | AI4Science: Accelerating Scientific Discovery with Artificial Intelligence
2023-12-05AI Forum 2023 | Harnessing AI for a Greener Tomorrow
2023-12-05AI Forum 2023 | Panel Discussion “AI Synergy: Science and Society”
2023-12-05AI Forum 2023 | Future of Foundation Models
2023-11-30PwR: Using representations for AI-powered software development
2023-11-10Binaural spatial audio positioning in video calls
2023-11-10Semi-supervised Multi-task learning for acoustic parameter estimation
2023-11-10Research intern talk: Real-time single-channel speech separation in noisy & reverberant environments
2023-11-10Research intern talk: Unified speech enhancement approach for speech degradation & noise suppression
2023-11-10Synchronized Audio-Visual Generation with a Joint Generative Diffusion Model and Contrastive Loss
2023-11-09Supporting the Responsible AI Red-Teaming Human Infrastructure | Jina Suh
2023-11-08Project Mosaic
2023-11-02Supporting the Responsible AI Red-Teaming Human Infrastructure | Jina Suh
2023-11-02Sociotechnical Approaches to Measuring Harms Caused by AI Systems | Hanna Wallach
2023-11-02Storytelling and futurism | Matt Corwine
2023-11-02Regulatory Innovation to Enable Use of Generative AI in Drug Development | Stephanie Simmons
2023-11-02AI Powered Community Micro-Grid for Resiliency and Equitability | Peeyush Kumar
2023-11-02Generative AI & Plural Governance: Mitigating Challenges & Surfacing Opportunities | Madeleine Daepp