VoluMe: Authentic 3D Video Calls from Live Gaussian Splat Prediction

Subscribers:
351,000
Published on ● Video Link: https://www.youtube.com/watch?v=xgh46cf4_fw



Duration: 0:00
1,457 views
44


Virtual 3D meetings offer the potential to enhance copresence, increase engagement and thus improve effectiveness of remote meetings compared to standard 2D video calls. However, representing people in 3D meetings remains a challenge; existing solutions achieve high quality by using complex hardware, making use of fixed appearance via enrolment, or by inverting a pre-trained generative model. These approaches lead to constraints that are unwelcome and ill-fitting for videoconferencing applications.

We present the first method to predict 3D Gaussian reconstructions in real time from a single 2D webcam feed, where the 3D representation is not only live and realistic, but also authentic to the input video. By conditioning the 3D representation on each video frame independently, our reconstruction faithfully recreates the input video from the captured viewpoint (a property we call authenticity), while generalizing realistically to novel viewpoints. Additionally, we introduce a stability loss to obtain reconstructions that are temporally stable on video sequences.

We show that our method delivers state-of-the-art accuracy in visual quality and stability metrics compared to existing methods, and demonstrate our approach in live one-to-one 3D meetings using only a standard 2D camera and display. This demonstrates that our approach can allow anyone to communicate volumetrically, via a method for 3D videoconferencing that is not only highly accessible, but also realistic and authentic.

Project page: https://aka.ms/VoluMe




Other Videos By Microsoft Research


2025-09-22More is Less: Extra Features in Contactless Payments Break Security
2025-09-18Sub-Population Identification of Multi-morbidity in Sub-Saharan African Populations
2025-09-03Echoes in GenAI generations
2025-08-27Six Years of Rowhammer: Breakthroughs and Future Directions
2025-08-25Sub-Population Identification of Multi-morbidity in Sub-Saharan African Populations
2025-08-19MindJourney: Test-Time Scaling with World Models for Spatial Reasoning
2025-08-11Medical Bayesian Kiosk (2010)
2025-08-07Reimagining healthcare delivery and public health with AI
2025-08-05VeriTrail: Detect hallucination and trace provenance in AI workflows
2025-07-31Computational models for brain science
2025-07-30VoluMe: Authentic 3D Video Calls from Live Gaussian Splat Prediction
2025-07-28How I became a StoryTeller (and how YOU can too)
2025-07-28Make some noise: Teaching the language of audio to an LLM using sound tokens
2025-07-28Building Better Language Models Through Global Understanding
2025-07-24Navigating medical education in the era of generative AI
2025-07-22DAViD: Data-efficient and Accurate Vision Models from Synthetic Data
2025-07-21AI Testing and Evaluation: Reflections
2025-07-20Intern talk: Distilling Self-Supervised-Learning-Based Speech Quality Assessment into Compact Models
2025-07-15AI Testing and Evaluation: Learnings from cybersecurity
2025-07-10Scalable emulation of protein equilibrium ensembles with BioEmu
2025-07-10How AI will accelerate biomedical research and discovery