AI Testing and Evaluation: Reflections

Channel:

Subscribers:

351,000

Published on July 21, 2025 4:02:22 PM ● Video Link: https://www.youtube.com/watch?v=7q3BN24qgxg

Duration: 0:00

372 views

In the series finale, Amanda Craig Deckard returns to examine what Microsoft has learned about testing as a governance tool. She also explores the roles of rigor, standardization, and interpretability in testing and what’s next for Microsoft’s AI governance work.

Show notes: https://www.microsoft.com/en-us/research/podcast/ai-testing-and-evaluation-reflections/
Listen to AI Testing and Evaluation: Learnings from Science and Industry series: https://www.microsoft.com/en-us/research/story/ai-testing-and-evaluation-learnings-from-science-and-industry/

Other Videos By Microsoft Research

2025-08-11	Medical Bayesian Kiosk (2010)
2025-08-07	Reimagining healthcare delivery and public health with AI
2025-08-05	VeriTrail: Detect hallucination and trace provenance in AI workflows
2025-07-31	Computational models for brain science
2025-07-30	VoluMe: Authentic 3D Video Calls from Live Gaussian Splat Prediction
2025-07-28	How I became a StoryTeller (and how YOU can too)
2025-07-28	Make some noise: Teaching the language of audio to an LLM using sound tokens
2025-07-28	Building Better Language Models Through Global Understanding
2025-07-24	Navigating medical education in the era of generative AI
2025-07-22	DAViD: Data-efficient and Accurate Vision Models from Synthetic Data
2025-07-21	AI Testing and Evaluation: Reflections
2025-07-20	Intern talk: Distilling Self-Supervised-Learning-Based Speech Quality Assessment into Compact Models
2025-07-15	AI Testing and Evaluation: Learnings from cybersecurity
2025-07-10	Scalable emulation of protein equilibrium ensembles with BioEmu
2025-07-10	How AI will accelerate biomedical research and discovery
2025-07-09	Introducing Microsoft AI Economy Institute
2025-07-07	AI Testing and Evaluation: Learnings from pharmaceuticals and medical devices
2025-07-03	Against Softmaxing Culture: Understanding Relational Practices in Expert and Ordinary Forms of Work
2025-06-30	AI Testing and Evaluation: Learnings from genome editing
2025-06-23	AI Testing and Evaluation: Learnings from Science and Industry
2025-06-18	Precio: Private Aggregate Measurement via Oblivious Shuffling