Perspectives on Cross-Validation

Subscribers:
344,000
Published on ● Video Link: https://www.youtube.com/watch?v=AU6OS_uq0mU



Duration: 54:36
1,439 views
27


Cross-validation is probably the most widely used method for risk estimation in machine learning and statistics. However, analyzing it and comparing it to the data splitting estimator has proved difficult. In the first part of the talk, I will present a new analysis which characterizes the exact asymptotic of cross-validation in the form of a central limit theorem for estimators which satisfy certain stability conditions. In particular, parametric estimators automatically satisfy these conditions, and the theorems characterize the cross-validated risk for such estimators fully. I will demonstrate that they exhibit a wide variety of behaviours: in the case of a parametric empirical risk minimizer, the folds behave as if independent if the evaluation loss is the same as the training loss. However, if a surrogate loss is used, different behaviours may occur. In the second part, I will move on to discuss issues which arise when using cross-validation for high-dimensional estimators: in the regime where the number of parameters is comparable to the number of observations, cross-validation (and data splitting) may introduce serious bias in the estimate of the risk when the amount of data left out is high (i.e. the number of folds is low). A natural approach may thus be to alleviate this problem by leaving out as little data as possible: a single observation, leading to leave-one-out cross-validation (LOOCV). I will show that indeed, such a result holds and the LOOCV estimator is consistent in the high-dimensional asymptotic. Unfortunately, the LOOCV estimator is computationally prohibitive, and cannot be used in practice. Finally, I will discuss a general framework, approximate LOOCV, from which closed-formed approximate estimators can be derived for penalized GLMs, including non-smooth ones such as the LASSO or SVMs.

See more at https://www.microsoft.com/en-us/research/video/perspectives-on-cross-validation/




Other Videos By Microsoft Research


2020-03-04AI, Azure and the future of healthcare with Dr. Peter Lee | Podcast
2020-02-27Towards Mainstream Brain-Computer Interfaces (BCIs)
2020-02-27Exploring Massively Multilingual, Massive Neural Machine Translation
2020-02-27Fireside Chat with Maarten de Rijke
2020-02-26Neural architecture search, imitation learning and the optimized pipeline with Dr. Debadeepta Dey
2020-02-21Information Agents: Directions and Futures (2001)
2020-02-19Democratizing data, thinking backwards and setting North Star goals with Dr. Donald Kossmann
2020-02-19Behind the scenes on Team Explorer’s practice run at Microsoft for the DARPA SubT Urban Challenge
2020-02-12Microsoft Scheduler and dawn of Intelligent PDAs with Dr. Pamela Bhattacharya | Podcast
2020-02-05Responsible AI with Dr. Saleema Amershi | Podcast
2020-02-03Perspectives on Cross-Validation
2020-01-30Data Science Summer School 2019 - Replicating "An Empirical Analysis of Racial Differences in Po..."
2020-01-29Going deep on deep learning with Dr. Jianfeng Gao | Podcast
2020-01-22Innovating in India with Dr. Sriram Rajamani [Podcast]
2020-01-17Underestimating the challenge of cognitive disabilities (and digital literacy)
2020-01-17Understanding Knowledge Distillation in Neural Sequence Generation
2020-01-17'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project
2020-01-07Private AI Bootcamp Keynote – Sreekanth Kannepalli
2020-01-07Introduction to CKKS (Approximate Homomorphic Encryption)
2020-01-07Private AI Bootcamp Competition: Team 3
2020-01-07Conversations Based on Search Engine Result Pages



Tags:
Cross-validation
machine learning and statistics
data splitting
LOOCV
AI
microsoft research