Small Variance Asymptotics, Bayesian Nonparametrics, and k-means

Subscribers:
344,000
Published on ● Video Link: https://www.youtube.com/watch?v=Ru1487n0rlY



Duration: 45:38
158 views
3


Bayesian approaches to clustering permit great flexibility existing models can handle cases when the number of clusters is not known upfront, or when one wants to share clusters across multiple data sets. Despite this flexibility, simpler methods such as k-means are the preferred choice in many applications due to their simplicity and scalability. One way to view k-means from a probabilistic perspective is as arising from a mixture of Gaussians model where the covariance of each cluster tends to zero. This talk will explore the use of similar asymptotics over a rich class of Bayesian nonparametric models, leading to several new algorithms that feature the simplicity of k-means as well as the flexibility of Bayesian nonparametrics. Among the methods discussed include: i) a k-means-like algorithm based on asymptotics of the Dirichlet process mixture model that does not fix the number of clusters upfront, ii) an algorithm for clustering multiple data sets based on the hierarchical Dirichlet process, iii) an overlapping clustering algorithm based on asymptotics of the beta process, iv) a k-means-like topic modeling algorithm arising from asymptotics over a Bayesian nonparametric hierarchical multinomial mixture.







Tags:
microsoft research