Differentially Private Synthetic Data without Training

Subscribers:
342,000
Published on ● Video Link: https://www.youtube.com/watch?v=cRQxo8MZMLI



Duration: 0:00
523 views
19


Speakers: Zinan Lin
Host: Kim Laine

Generating differentially private (DP) synthetic data that closely resembles original data while preserving user privacy is a scalable solution to address privacy concerns in today's data-driven world.

In this talk, I will introduce Private Evolution (PE), a new training-free framework for DP synthetic data generation, which contrasts with existing approaches that rely on training DP generative models. PE treats foundation models as blackboxes and only utilizes their inference APIs. We demonstrate that across both images and text, PE: (1) matches or even outperforms prior state-of-the-art (SoTA) methods in the fidelity-privacy trade-off without any model training; (2) enables the use of advanced open-source models (e.g., Mixtral) and API-based models (e.g., GPT-3.5), where previous SoTA approaches are inapplicable; and (3) is more computationally efficient than prior SoTA methods.

Additionally, I will discuss recent extensions of PE--both from our work and contributions from the broader community--including the integration of data simulators, fusion of knowledge from multiple models for DP data synthesis, and applications in federated learning. We hope that PE unlocks the full potential of foundation models in privacy-preserving machine learning and accelerates the adoption of DP synthetic data across industries.




Other Videos By Microsoft Research


6 days agoMicrosoft as Customer Zero: Empowering Research Teams with AI
6 days agoA Fever Dream of Machine Learning Framework Composability
2025-04-29AI for Africa’s Future: Innovation, Equity, and Impact
2025-04-22Hamming Quasi-Cyclic
2025-04-22Towards Safer Augmented Reality: Identifying, Evaluating, and Mitigating Security & Privacy Threats
2025-04-22Shining light on the learning brain: Estimating mental workload in a simulated flight task using opt
2025-03-24How to Compress Garbled Circuit Input Labels, Efficiently
2025-03-24Differentially Private Synthetic Data without Training
2025-03-21Celebrating Susan Dumais: Reflections on a Legacy of Research and Collaboration | Plenary Session
2025-03-21The Assistant: Situated Interaction Project (2012)
2025-03-20The AI Revolution in Medicine, Revisited: An Introduction
2025-03-10AI and Europe's history of reinvention
2025-03-03World and Human Action Models towards gameplay ideation (Supplementary Video 1)
2025-03-03LLMs vs. Torch 1.5: Why Your Code Assistant Can't Keep Up
2025-02-25Using LLMs for safe low-level programming | Microsoft Research Forum
2025-02-25AutoGen v0.4: Reimagining the foundation of agentic AI for scale and more | Microsoft Research Forum
2025-02-25Belief state transformers | Microsoft Research Forum
2025-02-25Magma: A foundation model for multimodal AI Agents | Microsoft Research Forum
2025-02-25Chimera: Accurate synthesis prediction by ensembling models with... | Microsoft Research Forum
2025-02-25AI for Precision Health: Learning the language of nature and patients | Microsoft Research Forum
2025-02-25Keynote: Multimodal Generative AI for Precision Health | Microsoft Research Forum