Complexity Theory for MapReduce Algorithms

Subscribers:
344,000
Published on ● Video Link: https://www.youtube.com/watch?v=lB0AFMij1ko



Duration: 1:21:47
1,285 views
14


For many problems, algorithms implemented in MapReduce or similar two-ranks-of-parallel-tasks systems exhibit a tradeoff between memory size and communication. More precisely, the tradeoff is between "reducer size" (the number of inputs received by the second rank of parallel tasks) and the "replication rate" (the average number of key-value pairs generated by the first rank in response to a single input). We begin with the simple but common "all-pairs" problem, where there is some output associated with every pair of inputs. For this problem, the reducer size and replication rate are seen to vary inversely. We then look at the different relationships that exist between these two parameters for a number of other problems, including (dense) matrix multiplication, similarity joins, and computing marginals







Tags:
microsoft research