Learning Mixtures of Arbitrary Distributions over Large Discrete Domains

Channel:

Subscribers:

351,000

Published on July 27, 2016 1:36:04 AM ● Video Link: https://www.youtube.com/watch?v=ts4Xr3wh6F8

Duration: 1:08:58

126 views

We give an algorithm for learning a mixture of unstructured distributions. This problem arises in various unsupervised learning scenarios, for example in learning topic models from a corpus of documents spanning several topics. We show how to learn the constituents of a mixture of k arbitrary distributions over a large discrete domain [n]={1,2,�,n\} and the mixture weights, using O(n polylog n) samples. (In the topic-model learning setting, the mixture constituents correspond to the topic distributions.) This task is information-theoretically impossible for k1 under the usual sampling process from a mixture distribution. However, there are situations (such as the above-mentioned topic model case) in which each sample point consists of several observations from the same mixture constituent. This number of observations, which we call the ``sampling aperture'', is a crucial parameter of the problem. We obtain the first bounds for this mixture-learning problem without imposing any assumptions on the mixture constituents. We show that efficient learning is possible exactly at the information-theoretically least-possible aperture of 2k-1. Thus, we achieve near-optimal dependence on n and optimal aperture. While the sample-size required by our algorithm depends exponentially on k, we prove that such a dependence is unavoidable when one considers general mixtures. A sequence of tools contribute to the algorithm, such as concentration results for random matrices, dimension reduction, moment estimations, and sensitivity analysis. Joint work with Leonard Schulman and Chaitanya Swamy.

Other Videos By Microsoft Research

2016-07-26	Going Big on Big Data
2016-07-26	Parallel Coordinates: Visual Multidimensional Geometry and its Applications
2016-07-26	Microblogging During Small Scale Incidents
2016-07-26	Patent Law's Perfect Storm
2016-07-26	Multi-Party Computation: From Theory to Practice
2016-07-26	Integrating Algorithmic and Behavioral Approaches to Crowdsourcing
2016-07-26	Narrating with Networks: Making Sense of Event Log Data with Socio-Technical Trajectories
2016-07-26	Algorithms and Perception for Interactive Free-Viewpoint Image-Based Navigation
2016-07-26	Learning to Construct and Reason with a Large Knowledge Base of Extracted Information
2016-07-26	Realtime Facial Animation
2016-07-26	Learning Mixtures of Arbitrary Distributions over Large Discrete Domains
2016-07-26	The Structural Theory of Pure Type Systems
2016-07-26	Building Social Life Networks
2016-07-26	How to write your next POPL paper in Dafny
2016-07-26	Small Image Sensors and Big Visual Data
2016-07-26	Why the Doorway is a Data Portal into Multi-Person Homes
2016-07-26	Coordinating Software Development through Predictive Conflict Detection
2016-07-26	Blur-Kernel Estimation from Spectral Irregularities
2016-07-26	Quantum algorithms for Hamiltonian simulation
2016-07-26	Understanding and Reducing the User Burdens in Applications for Health and Wellbeing
2016-07-26	Evaluating Open Source Software

Tags:

microsoft research

Channel	Latest
BoraLo	6 hours ago
GAMErHyNas	6 hours ago
ChessBase India	6 hours ago
EvGeN Channel	6 hours ago
MG Surprise Toys	6 hours ago
Gaming Raju	6 hours ago
egboj20	6 hours ago
Adjie Cahyono	7 hours ago
Zenix4U	7 hours ago
Gothic Sorcerer	7 hours ago
ᗷᖇᑌᑕE ᒪEE ᖴIST Oᖴ ᖴᑌᖇY	7 hours ago
ATMの裏側	7 hours ago
JastrzabPost	7 hours ago
Dragon Fights	7 hours ago
DIVIDED GAMERS	7 hours ago
MGTracey	7 hours ago
ShaggyJonJ	7 hours ago
Alif Rahza	7 hours ago
Simulation	7 hours ago
THANATOS	7 hours ago
EVO World of Tanks Replays	7 hours ago
MLBB-مواجهة الأبطال	7 hours ago
JK _00	7 hours ago
チャンネルふいしんく【huisync】	7 hours ago
DieHahn	7 hours ago