Imperial College London > Talks@ee.imperial > Featured talks > Upper and lower bounds on the cost of a mapreduce computation.

Upper and lower bounds on the cost of a mapreduce computation.

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Wiesia R Hsissen.

As MapReduce/Hadoop grows in importance, we find more exotic applications being written this way. Not every program written for this platform performs as well as we might wish. There are several reasons why a MapReduce program can underperform expectations. One is the need to balance the communication cost of transporting data from the mappers to the reducers against the computation done at the mappers and reducers themselves. A second important issue is selecting the number of rounds of MapReduce. A third issue is that of skew. If wall-clock time is important, then using many different reduce-keys and many compute nodes may minimize the time to finish the job. Yet if the data is uncooperative, and no provision is made to distribute the data evenly, much of the work is done by a single node.

In this talk we will focus on the tradeoff between communication cost and the computation cost of the reducers: the finer we partition the work of the reducers so that more parallelism can be extracted, the greater will be the total communication between mappers and reducers. We introduce a model of problems that can be solved in a single round of MapReduce computation, and use it to demonstrate the tradeoff. This model enables a generic recipe for discovering lower bounds on communication cost as a function of the computation cost of the reducers, which is captured as the maximum number of inputs that can be assigned to one reducer. We then use the model to compute lower bounds and present algorithms that meet these bounds for a number of problems described below. Algorithms and lower bounds will be presented for problems such as: Similarity joins, Matrix multiplication, subgraph finding, multiway joins.

This talk is part of the Featured talks series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

Changes to Talks@imperial | Privacy and Publicity