A Nearly Tight Analysis of Greedy k-means++

Subscribers:
348,000
Published on ● Video Link: https://www.youtube.com/watch?v=NDAVDRFMh_0



Duration: 52:13
602 views
14


A Google TechTalk, presented by Václav Rozhoň, 2023-04-13
Abstract: The famous k-means++ algorithm of Arthur and Vassilvitskii is the most popular practical algorithm for solving the k-means problem. The algorithm is very simple and computes the k output centers as follows: it samples the first center as a uniformly random point in the dataset and each of the following k−1 centers is then always sampled with probability proportional to the squared distance to the currently closest center. Amazingly, the k-means++ algorithm is known to return a Θ(log k) approximate solution in expectation.
In their seminal work, Arthur and Vassilvitskii asked about the guarantees of its following greedy variant: in every step, we sample ℓ candidate centers instead of one and then pick the one that minimizes the new cost. This is also how k-means++ is implemented in e.g. the popular Scikit-learn library. We analyze greedy k-means++: We prove that it is an O(ℓ^3 * log^3 k)-approximation algorithm and provide a near-matching lower bound.

Joint work with Christoph Grunau, Ahmet Alper Özüdoğru, Jakub Tětek
arxiv: https://arxiv.org/abs/2207.07949

Bio: Vaclav Rozhon is a PhD student at ETH Zurich advised by Mohsen Ghaffari. He works mostly on distributed and parallel algorithms; he also creates YouTube videos about algorithms (channel name: polylog). He has a young child and thus no hobbies.

A Google Talk Series on Algorithms, Theory, and Optimization




Other Videos By Google TechTalks


2023-06-05Foundation Models and Fair Use
2023-05-30Differentially Private Online to Batch
2023-05-30Differentially Private Diffusion Models Generate Useful Synthetic Images
2023-05-30Improving the Privacy Utility Tradeoff in Differentially Private Machine Learning with Public Data
2023-05-30Randomized Approach for Tight Privacy Accounting
2023-05-30Almost Tight Error Bounds on Differentially Private Continual Counting
2023-05-30EIFFeL: Ensuring Integrity for Federated Learning
2023-05-30Differentially Private Diffusion Models
2023-05-15Damian Grimling | Sentistocks | Sentimenti | web3 talks | March 9th 2023 | MC: Blake DeBenon
2023-04-21Branimir Rakic | CTO & Co-Founder of OriginTrail | web3 talks | Feb 27th 2023 | MC: Alex Ticamera
2023-04-15A Nearly Tight Analysis of Greedy k-means++
2023-04-15Introduction to Length-Constrained Expanders and Expander Decompositions
2023-04-07Improved Feature Importance Computation for Tree Models Based on the Banzhaf Value
2023-04-07A Unifying Theory of Distance to Calibration
2023-04-07Dynamic Graph Sketching: To Infinity And Beyond
2023-03-20Sergey Nazarov | Co-Founder Chainlink | web3 talks | Mar 16 2023 | MC: Marlon Ruiz
2023-03-09Evan Shapiro | CEO Mina Foundation | web3 talks | Feb 16th 2023 | MC: Marlon Ruiz
2023-03-07Zürich Go Meetup: Zero-effort Type-safe Parsing of JSON and XML
2023-03-07Zürich Go Meetup: Let’s Build a Game with Go
2023-03-07Zürich Go Meetup: Run Go programs on your Raspberry Pi with gokrazy!
2023-03-03Online Covering: Secretaries, Prophets and Universal Maps