Machine Learning Day 2013 - Deep Learning; A Bayesian Information Criterion for Singular Models

Subscribers:
351,000
Published on ● Video Link: https://www.youtube.com/watch?v=Y23TEdpoJno



Category:
Guide
Duration: 1:08:17
764 views
5


Typically, one approaches a supervised machine learning problem by writing down an objective function and finding a hypothesis that minimizes it. This is equivalent to finding the Maximum A Posteriori (MAP) hypothesis for a Boltzmann distribution. However, MAP is not a robust statistic. As an alternative, we define the depth of hypotheses and show that generalization and robustness can be bounded as a function of this depth. Therefore, we suggest using the median hypothesis, which is a deep hypothesis, and present algorithms for approximating it. One contribution of this work is an efficient method for approximating the Tukey median. The Tukey median, which is often used for data visualization and outlier detection, is a special case of the family of medians we define: however, computing it exactly is exponentially slow in the dimension. Our algorithm approximates such medians in polynomial time while making weaker assumptions than those required by previous work. The presentation is based on a joint work with Chris Burges. The Bayesian Information Criterion (BIC) is a widely used model selection technique that is inspired by the large-sample asymptotic behavior of Bayesian approaches to model selection. In this talk we will consider such approximate Bayesian model choice for problems that involve models whose Fisher-information matrices may fail to be invertible along other competing submodels. When models are singular in this way, the penalty structure in BIC generally does not reflect the large-sample behavior of their Bayesian marginal likelihood. While large-sample theory for the marginal likelihood of singular models has been developed recently, the resulting approximations depend on the true parameter value and lead to a paradox of circular reasoning. Guided by examples such as determining the number of components of mixture models, the number of factors in latent factor models or the rank in reduced-rank regression, we propose a resolution to this paradox and give a practical extension of BIC for singular model selection problems. Joint work with Martyn Plummer.




Other Videos By Microsoft Research


2016-08-08Image Classification Using a Background Prior
2016-08-08Tutorial 3: Data-Constrained Environmental Modeling: FetchClimate, Filzbach, & Distribution Modelle
2016-08-08Real time control with lots of humans in the loop
2016-08-08Geek Knowing: From FAQ to Feminism 101
2016-08-08Culture differences between US and China
2016-08-08The 31st UW/MS Symposium in Computational Linguistics
2016-08-08Parallel Thinking
2016-08-08The 3rd Age of Computing
2016-08-08Computational Fair Division: From Cake Cutting to Cluster Computing
2016-08-08Student Session: Learning Cloud Computing, Environmental Science, and You
2016-08-08Machine Learning Day 2013 - Deep Learning; A Bayesian Information Criterion for Singular Models
2016-08-08IEEE eScience Keynote: From Genes to Stars
2016-08-08Large-Scale Data Analysis for Biomedical and Social Sciences - Tom Cai
2016-08-08Large-Scale Data Analysis for Biomedical and Social Sciences - Takayuki Okatani
2016-08-08Tutorial 2 - Kinect for Windows in Science Applications - SDK Introduction
2016-08-08Machine Learning Day 2013 - Clustering; Geometry Preserving Non-Linear Dimension Reduction
2016-08-08From Smart Sensors to City OS (II) - Panel Discussion
2016-08-08Locally Testable Codes and L_1 Embeddings of Cayley Graphs
2016-08-08Interactive Visual Analytics for Scientific Discovery - Solving Problems with Visual Analytics
2016-08-08Big Planet Big Questions, Big Data Big Science - Fetch Climate
2016-08-08From Smart Sensors to City OS (II) - Lei Chen



Tags:
microsoft research