Differentially Private Synthetic Data without Training

Channel:

Subscribers:

342,000

Published on March 24, 2025 3:48:35 PM ● Video Link: https://www.youtube.com/watch?v=cRQxo8MZMLI

Duration: 0:00

523 views

Speakers: Zinan Lin
Host: Kim Laine

Generating differentially private (DP) synthetic data that closely resembles original data while preserving user privacy is a scalable solution to address privacy concerns in today's data-driven world.

In this talk, I will introduce Private Evolution (PE), a new training-free framework for DP synthetic data generation, which contrasts with existing approaches that rely on training DP generative models. PE treats foundation models as blackboxes and only utilizes their inference APIs. We demonstrate that across both images and text, PE: (1) matches or even outperforms prior state-of-the-art (SoTA) methods in the fidelity-privacy trade-off without any model training; (2) enables the use of advanced open-source models (e.g., Mixtral) and API-based models (e.g., GPT-3.5), where previous SoTA approaches are inapplicable; and (3) is more computationally efficient than prior SoTA methods.

Additionally, I will discuss recent extensions of PE--both from our work and contributions from the broader community--including the integration of data simulators, fusion of knowledge from multiple models for DP data synthesis, and applications in federated learning. We hope that PE unlocks the full potential of foundation models in privacy-preserving machine learning and accelerates the adoption of DP synthetic data across industries.

Other Videos By Microsoft Research

6 days ago	Microsoft as Customer Zero: Empowering Research Teams with AI
6 days ago	A Fever Dream of Machine Learning Framework Composability
2025-04-29	AI for Africa’s Future: Innovation, Equity, and Impact
2025-04-22	Hamming Quasi-Cyclic
2025-04-22	Towards Safer Augmented Reality: Identifying, Evaluating, and Mitigating Security & Privacy Threats
2025-04-22	Shining light on the learning brain: Estimating mental workload in a simulated flight task using opt
2025-03-24	How to Compress Garbled Circuit Input Labels, Efficiently
2025-03-24	Differentially Private Synthetic Data without Training
2025-03-21	Celebrating Susan Dumais: Reflections on a Legacy of Research and Collaboration \| Plenary Session
2025-03-21	The Assistant: Situated Interaction Project (2012)
2025-03-20	The AI Revolution in Medicine, Revisited: An Introduction
2025-03-10	AI and Europe's history of reinvention
2025-03-03	World and Human Action Models towards gameplay ideation (Supplementary Video 1)
2025-03-03	LLMs vs. Torch 1.5: Why Your Code Assistant Can't Keep Up
2025-02-25	Using LLMs for safe low-level programming \| Microsoft Research Forum
2025-02-25	AutoGen v0.4: Reimagining the foundation of agentic AI for scale and more \| Microsoft Research Forum
2025-02-25	Belief state transformers \| Microsoft Research Forum
2025-02-25	Magma: A foundation model for multimodal AI Agents \| Microsoft Research Forum
2025-02-25	Chimera: Accurate synthesis prediction by ensembling models with... \| Microsoft Research Forum
2025-02-25	AI for Precision Health: Learning the language of nature and patients \| Microsoft Research Forum
2025-02-25	Keynote: Multimodal Generative AI for Precision Health \| Microsoft Research Forum

Channel	Latest
Oldgreg859	6 hours ago
domisumReplay: Sona	6 hours ago
domisumReplay: Rengar	6 hours ago
CZor	6 hours ago
legorocks99	6 hours ago
HIKI	6 hours ago
Alone Player	6 hours ago
石川Yaya	7 hours ago
XJ9	7 hours ago
LEGIQN	7 hours ago
Riftory	7 hours ago
Sempatuco	7 hours ago
Brad Hahn	7 hours ago
Troydan	7 hours ago
JSkeleton's Nintendo World	7 hours ago
Valentyme	7 hours ago
theloladass Gaming	7 hours ago
Hideaki KAGAWA	7 hours ago
YT-洪同十(不定時直播(目標1000訂閱)	7 hours ago
Tugarych	7 hours ago
KATITO JOGA	7 hours ago
Commander Xevon	7 hours ago
DjMaRiiO	7 hours ago
Playscope Gaming	7 hours ago
LGLegendary	7 hours ago