Fast Dual-Tree k-means with Bounded Single-Iteration Runtime
Channel:
Subscribers:
344,000
Published on ● Video Link: https://www.youtube.com/watch?v=QciM7aHCwWI
k-means is a widely used clustering algorithm, but for k clusters and a dataset size of N, each iteration of Lloyd's algorithm costs O(kN) time. Although there are a handful of techniques to accelerate single Lloyd iterations, none of these techniques are tailored to the case of large k, which is increasingly common as dataset sizes grow (one example: vector quantization), and none of these techniques have a worst-case runtime bound of less than O(kN) per iteration. I present a dual-tree algorithm with a worst-case runtime bound of O(N + k log k) under certain assumptions on the dataset, and show that this algorithm outperforms all other alternatives as k and N grow. An implementation is readily available in mlpack.
Other Videos By Microsoft Research
Tags:
microsoft research