Robust Semi-Supervised Learning

Subscribers:
344,000
Published on ● Video Link: https://www.youtube.com/watch?v=pRo2_GAe8O4



Duration: 1:12:18
2,934 views
24


Semi-supervised learning algorithms are designed to learn an unknown concept from a partially-labeled data set of training examples. They are widely popular in practice, since labels are often very costly to obtain. This talk is about a new approach to semi-supervised learning that addresses a mismatch between the way semi-supervised learning algorithms have been developed and the way they are commonly used. Most existing semi-supervised learning algorithms are analyzed under the assumption that the algorithm can randomly select a subset of unlabeled training examples and submit them to a labeler. But in many applications of semi-supervised learning, the partially-labeled data is `naturally occurring', and the learning algorithm has no control over which examples are labeled. This is particularly true of data generated by web users. Websites like Facebook, Youtube and Flikr give users the option to label, or 'tag', images, videos and other content, a process that generates very large partially-labeled data sets. We have little understanding of how users select which items to label, but we know that they almost certainly are not selecting them randomly. Instead of assuming that labels are missing at random, we analyze a less favorable scenario where the label information can be missing partially and arbitrarily. We present nearly matching upper and lower generalization bounds for learning in this setting under reasonable assumptions about available label information. Motivated by the analysis, we formulate a convex optimization problem for parameter estimation, derive an efficient algorithm suitable for large data sets, and analyze its convergence. Our algorithm can be viewed as a convex formulation of existing nonconvex approaches to semi-supervised learning, such as posterior regularization, and therefore is less sensitive to local minima. In experiments on several image data sets, we show that our algorithm performs much better than existing semi-supervised learning algorithms under a number of challenging but realistic labeling scenarios. This is joint work with Ben Taskar.




Other Videos By Microsoft Research


2016-08-16Minimal Multithreading - Exploiting Redundancy in Parallel Systems
2016-08-16LATAM 2011: Overview of Recent Projects from Microsoft Research
2016-08-16LATAM 2011: LACCIR & FAPESP Projects
2016-08-16Scalable Management of Enterprise and Data Center Networks
2016-08-16LATAM 2011: Scientific Computing using Windows Azure
2016-08-16Collaborative Information Seeking: The Art & Science of Making the Whole Greater than the Sum of All
2016-08-16Human-Computer Persuasive Interaction: Designing the emotional bond with customers
2016-08-16Photographing events over time
2016-08-16LATAM 2011: Plenary Session - Computing and the Future
2016-08-16LATAM 2011: Plenary Session - The Path to Open Science with Illustrations from Computational Biology
2016-08-16Robust Semi-Supervised Learning
2016-08-16LATAM 2011: Semantic Computing for eScience
2016-08-16LATAM 2011: High-Fidelity Augmented Reality Interactions
2016-08-16LATAM 2011: The Role of Basic Research in Technology
2016-08-16Learning Efficient Nash Equilibria in Distributed Systems
2016-08-16Data Triggered Threads -- Eliminating Redundant Computation
2016-08-16LATAM 2011: Scaling Science in the Cloud: From Satellite to Science Variables with MODISAzure
2016-08-16LATAM 2011: Science, Technology and Innovation Strategies for Promoting Competitiveness
2016-08-16Functional connectomics of neural networks
2016-08-16Decision-Theoretic Control for Crowdsourcing
2016-08-16Protecting Circuits from Leakage: The Computationally-Bounded and Noisy Cases



Tags:
microsoft research