AI Testing and Evaluation: Reflections

Subscribers:
351,000
Published on ● Video Link: https://www.youtube.com/watch?v=7q3BN24qgxg



Duration: 0:00
372 views
11


In the series finale, Amanda Craig Deckard returns to examine what Microsoft has learned about testing as a governance tool. She also explores the roles of rigor, standardization, and interpretability in testing and what’s next for Microsoft’s AI governance work.

Show notes: https://www.microsoft.com/en-us/research/podcast/ai-testing-and-evaluation-reflections/
Listen to AI Testing and Evaluation: Learnings from Science and Industry series: https://www.microsoft.com/en-us/research/story/ai-testing-and-evaluation-learnings-from-science-and-industry/




Other Videos By Microsoft Research


2025-08-11Medical Bayesian Kiosk (2010)
2025-08-07Reimagining healthcare delivery and public health with AI
2025-08-05VeriTrail: Detect hallucination and trace provenance in AI workflows
2025-07-31Computational models for brain science
2025-07-30VoluMe: Authentic 3D Video Calls from Live Gaussian Splat Prediction
2025-07-28How I became a StoryTeller (and how YOU can too)
2025-07-28Make some noise: Teaching the language of audio to an LLM using sound tokens
2025-07-28Building Better Language Models Through Global Understanding
2025-07-24Navigating medical education in the era of generative AI
2025-07-22DAViD: Data-efficient and Accurate Vision Models from Synthetic Data
2025-07-21AI Testing and Evaluation: Reflections
2025-07-20Intern talk: Distilling Self-Supervised-Learning-Based Speech Quality Assessment into Compact Models
2025-07-15AI Testing and Evaluation: Learnings from cybersecurity
2025-07-10Scalable emulation of protein equilibrium ensembles with BioEmu
2025-07-10How AI will accelerate biomedical research and discovery
2025-07-09Introducing Microsoft AI Economy Institute
2025-07-07AI Testing and Evaluation: Learnings from pharmaceuticals and medical devices
2025-07-03Against Softmaxing Culture: Understanding Relational Practices in Expert and Ordinary Forms of Work
2025-06-30AI Testing and Evaluation: Learnings from genome editing
2025-06-23AI Testing and Evaluation: Learnings from Science and Industry
2025-06-18Precio: Private Aggregate Measurement via Oblivious Shuffling