Binaural spatial audio positioning in video calls

Subscribers:
342,000
Published on ● Video Link: https://www.youtube.com/watch?v=MjLoVCkyRbY



Duration: 1:03:56
459 views
0


Speakers: Jeremy Hyrkas
Host: Hannes Gamper

Spatially separating voices plays a crucial role in speech intelligibility, speaker identification and cognitive load in conversations. Voices are naturally separated in in-person conversations, but in most video conferencing software voices are mixed down to one channel. Spatial audio has been shown to improve user experience in audio teleconferencing. However, while voice streams in audio-only calls can be positioned anywhere in three-dimensions around a listener’s head without concern for visual stimuli, video calls display corresponding speakers in a narrow 2D plane on the listener’s screen. Audio/visual mismatch may cause discomfort or disorientation for the listener while a tightly coupled pairing of audio and video positioning may result in insufficient spacing between audio streams to show previously discovered benefits. Therefore, the ideal stream placement for audio in video call software remains an open question.

To better understand the remote user experience with spatial audio and video calls, we conducted a user study focused on user preference and stream identification depending on the width of the spatial audio stage. Participants used their laptops and headphones to watch videos simulating videos calls between either two or four speakers, with four levels of horizontal spread per video set: no spread (i.e. diotic playback without spatialization), narrow, medium, and wide. Increased spatial spread was found to rate higher in audio and visual correspondence, as well as their ability and confidence to identify specific audio streams. However, spatialization benefits plateaued for the wider spreads tested, with the four-speaker condition benefiting from a wider audio stage than the two-speaker condition. The results indicate that spreading audio streams spatially in video calls has listener benefits. Feedback from an open-ended post-study questionnaire suggests that some listeners prefer a narrower audio stage that corresponds more strongly with visuals when there are only two active speakers, while for four speakers some listeners prefer a wider audio stage that may increase intelligibility.

Learn more: https://www.microsoft.com/en-us/research/video/binaural-spatial-audio-positioning-in-video-calls/




Other Videos By Microsoft Research


2023-12-05AI Forum 2023 | Phase Transition in AI
2023-12-05AI Forum 2023 | AI for Neurodiverse Society
2023-12-05AI Forum 2023 | Bridging Disciplines: Exploring the Frontiers of New Computing Paradigms
2023-12-05AI Forum 2023 | Innovating Intelligent Environments for Wireless Communication & Sensing
2023-12-05AI Forum 2023 | Towards Responsible AI Deployment
2023-12-05AI Forum 2023 | AI4Science: Accelerating Scientific Discovery with Artificial Intelligence
2023-12-05AI Forum 2023 | Harnessing AI for a Greener Tomorrow
2023-12-05AI Forum 2023 | Panel Discussion “AI Synergy: Science and Society”
2023-12-05AI Forum 2023 | Future of Foundation Models
2023-11-30PwR: Using representations for AI-powered software development
2023-11-10Binaural spatial audio positioning in video calls
2023-11-10Semi-supervised Multi-task learning for acoustic parameter estimation
2023-11-10Research intern talk: Real-time single-channel speech separation in noisy & reverberant environments
2023-11-10Research intern talk: Unified speech enhancement approach for speech degradation & noise suppression
2023-11-10Synchronized Audio-Visual Generation with a Joint Generative Diffusion Model and Contrastive Loss
2023-11-09Supporting the Responsible AI Red-Teaming Human Infrastructure | Jina Suh
2023-11-08Project Mosaic
2023-11-02Supporting the Responsible AI Red-Teaming Human Infrastructure | Jina Suh
2023-11-02Sociotechnical Approaches to Measuring Harms Caused by AI Systems | Hanna Wallach
2023-11-02Storytelling and futurism | Matt Corwine
2023-11-02Regulatory Innovation to Enable Use of Generative AI in Drug Development | Stephanie Simmons