A Nearly Tight Analysis of Greedy k-means++

Channel:

Google TechTalks

Subscribers:

349,000

Published on April 15, 2023 4:53:39 PM ● Video Link: https://www.youtube.com/watch?v=NDAVDRFMh_0

Duration: 52:13

602 views

A Google TechTalk, presented by Václav Rozhoň, 2023-04-13
Abstract: The famous k-means++ algorithm of Arthur and Vassilvitskii is the most popular practical algorithm for solving the k-means problem. The algorithm is very simple and computes the k output centers as follows: it samples the first center as a uniformly random point in the dataset and each of the following k−1 centers is then always sampled with probability proportional to the squared distance to the currently closest center. Amazingly, the k-means++ algorithm is known to return a Θ(log k) approximate solution in expectation.
In their seminal work, Arthur and Vassilvitskii asked about the guarantees of its following greedy variant: in every step, we sample ℓ candidate centers instead of one and then pick the one that minimizes the new cost. This is also how k-means++ is implemented in e.g. the popular Scikit-learn library. We analyze greedy k-means++: We prove that it is an O(ℓ^3 * log^3 k)-approximation algorithm and provide a near-matching lower bound.

Joint work with Christoph Grunau, Ahmet Alper Özüdoğru, Jakub Tětek
arxiv: https://arxiv.org/abs/2207.07949

Bio: Vaclav Rozhon is a PhD student at ETH Zurich advised by Mohsen Ghaffari. He works mostly on distributed and parallel algorithms; he also creates YouTube videos about algorithms (channel name: polylog). He has a young child and thus no hobbies.

A Google Talk Series on Algorithms, Theory, and Optimization

Other Videos By Google TechTalks

2023-06-05	Foundation Models and Fair Use
2023-05-30	Differentially Private Online to Batch
2023-05-30	Differentially Private Diffusion Models Generate Useful Synthetic Images
2023-05-30	Improving the Privacy Utility Tradeoff in Differentially Private Machine Learning with Public Data
2023-05-30	Randomized Approach for Tight Privacy Accounting
2023-05-30	Almost Tight Error Bounds on Differentially Private Continual Counting
2023-05-30	EIFFeL: Ensuring Integrity for Federated Learning
2023-05-30	Differentially Private Diffusion Models
2023-05-15	Damian Grimling \| Sentistocks \| Sentimenti \| web3 talks \| March 9th 2023 \| MC: Blake DeBenon
2023-04-21	Branimir Rakic \| CTO & Co-Founder of OriginTrail \| web3 talks \| Feb 27th 2023 \| MC: Alex Ticamera
2023-04-15	A Nearly Tight Analysis of Greedy k-means++
2023-04-15	Introduction to Length-Constrained Expanders and Expander Decompositions
2023-04-07	Improved Feature Importance Computation for Tree Models Based on the Banzhaf Value
2023-04-07	A Unifying Theory of Distance to Calibration
2023-04-07	Dynamic Graph Sketching: To Infinity And Beyond
2023-03-20	Sergey Nazarov \| Co-Founder Chainlink \| web3 talks \| Mar 16 2023 \| MC: Marlon Ruiz
2023-03-09	Evan Shapiro \| CEO Mina Foundation \| web3 talks \| Feb 16th 2023 \| MC: Marlon Ruiz
2023-03-07	Zürich Go Meetup: Zero-effort Type-safe Parsing of JSON and XML
2023-03-07	Zürich Go Meetup: Let’s Build a Game with Go
2023-03-07	Zürich Go Meetup: Run Go programs on your Raspberry Pi with gokrazy!
2023-03-03	Online Covering: Secretaries, Prophets and Universal Maps

Channel	Latest
Ace101Infinity	6 hours ago
LIA MENDI	6 hours ago
Exitosa Noticias	6 hours ago
Android4L	6 hours ago
Burning P	6 hours ago
Rimas 100	6 hours ago
Canal RCN	6 hours ago
EL TIEMPO	6 hours ago
PryGames	6 hours ago
Edi Solo Gaming	7 hours ago
BIGAME	7 hours ago
João Maluco2	7 hours ago
Darker Senpai	7 hours ago
Korea Retro Game	7 hours ago
Simple Alpaca	7 hours ago
Jamaican Adventures	7 hours ago
Lokonazo1	7 hours ago
Real Betis Balompié	7 hours ago
TRC Gameplay	7 hours ago
GENIAL	7 hours ago
Musicking26	7 hours ago
MULTIMEDIOS	7 hours ago
Cryptobruj	7 hours ago
Dota 2 - Akon. tv	7 hours ago
MultiSt3p	7 hours ago