Geometry-constrained Beamforming Network for end-to-end Farfield Sound Source Separation

Subscribers:
344,000
Published on ● Video Link: https://www.youtube.com/watch?v=wY7hn6ZMj6M



Duration: 1:01:32
1,122 views
27


Environmental noise, reverberation and interfering speakers negatively affect the quality of the speech signal and therefore degrade the performance of many speech communication systems including automatic speech recognition systems, hearing assistive devices and mobile devices. Many deep learning solutions are available to perform source separation and reduce background noise. However, when a physical interpretation of a signal is possible or multi-channel inputs are available conventional acoustic signal processing, e.g., beamforming and direction-of-arrival estimators (DOA), tend to be more interpretable and yield reasonably good solutions in many cases. This motivates to integrate deep learning and conventional acoustic signal processing solutions to profit from each other, as has been proposed by several works. However, the integration is typically performed in a modular way where each component is optimized individually, which may lead to non-optimal solution.

In this talk, we propose a DOA-driven beamforming network (DBnet) for end-to-end source separation, i.e., the gradient is passed in an end-to-end optimization way from time-domain separated speech signals of speakers to time-domain microphone signals. For DBnet structure, we consider either recurrent neural network (RNN) or a mixture of convolutional and RNN. We analyze the performance of the DBnet for challenging noisy and reverberant conditions and benchmark it with the state-of-the-art source separation methods.

Learn more about this and other talks at Microsoft Research: https://www.microsoft.com/en-us/research/video/geometry-constrained-beamforming-network-for-end-to-end-farfield-sound-source-separation/




Other Videos By Microsoft Research


2020-12-09Physical computing for computer science education
2020-12-09Accessible CS Education Fall Workshop: Microsoft Chief Accessibility Officer Jenny Lay-Flurrie
2020-12-09Students with disabilities in the U.S.
2020-12-09Welcome & Introduction to Microsoft's Accessible Computer Science Education Fall Workshop
2020-12-08De-Identifying Healthcare Data for Research
2020-12-05Task-Oriented Dialogue as Dataflow Synthesis
2020-12-03The opportunities with AI and machine learning
2020-12-02Demonstration of Lumiere (1995)
2020-12-02Demonstration of Priorities & Notification Platform (2001)
2020-12-01Recent Efforts Towards Efficient And Scalable Neural Waveform Coding
2020-12-01Geometry-constrained Beamforming Network for end-to-end Farfield Sound Source Separation
2020-11-24Directions in ML: Automating Dataset Comparison and Manipulation with Optimal Transport
2020-11-13Audio-based Toxic Language Detection
2020-11-05CDO roundtable: Generating business value through data quality
2020-11-04Unlocking IoT Data for Research in Healthcare
2020-11-03MSR Twitter Local Events
2020-11-02Spotlight on advancements in AI, HCI, Computing, VR, Systems Networking & more at Microsoft Research
2020-10-30Distinct population of sudden unexpected infant death based on age
2020-10-28Enabling interaction between mixed reality and robots via cloud-based localization
2020-10-26Directions in ML: AutoML & Interpretability: Powering the machine learning revolution in healthcare
2020-10-23Evaluating and validating research that aspires to societal impact in real world scenarios with Tanu



Tags:
Beamforming Network
speech signal
DBnet
recurrent neural network
RNN
automatic speech recognition systems
direction-of-arrival estimators
Microsoft Research
Farfield Sound Source Separation