Imperial College London > Talks@ee.imperial > Featured talks > Upper and lower bounds on the cost of a mapreduce computation.
Log inImperial users Other users No account?Information onFinding a talk Adding a talk Syndicating talks Who we are Everything else |
Upper and lower bounds on the cost of a mapreduce computation.Add to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Wiesia R Hsissen. As MapReduce/Hadoop grows in importance, we find more exotic applications being written this way. Not every program written for this platform performs as well as we might wish. There are several reasons why a MapReduce program can underperform expectations. One is the need to balance the communication cost of transporting data from the mappers to the reducers against the computation done at the mappers and reducers themselves. A second important issue is selecting the number of rounds of MapReduce. A third issue is that of skew. If wall-clock time is important, then using many different reduce-keys and many compute nodes may minimize the time to finish the job. Yet if the data is uncooperative, and no provision is made to distribute the data evenly, much of the work is done by a single node. In this talk we will focus on the tradeoff between communication cost and the computation cost of the reducers: the finer we partition the work of the reducers so that more parallelism can be extracted, the greater will be the total communication between mappers and reducers. We introduce a model of problems that can be solved in a single round of MapReduce computation, and use it to demonstrate the tradeoff. This model enables a generic recipe for discovering lower bounds on communication cost as a function of the computation cost of the reducers, which is captured as the maximum number of inputs that can be assigned to one reducer. We then use the model to compute lower bounds and present algorithms that meet these bounds for a number of problems described below. Algorithms and lower bounds will be presented for problems such as: Similarity joins, Matrix multiplication, subgraph finding, multiway joins. This talk is part of the Featured talks series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsTalks Type the title of a new list here Type the title of a new list hereOther talksEvaluating Mobile Video and Mobile Applications Advances in Laser Direct-Write for Digital Microfabrication Symbolic Code Generation of Optimal Control Problems. Large and Real time engineering applications Model Predictive Control of linear Systems with disturbances - control parameterizations and their tradeoffs BATS: Achieving the Capacity of Networks with Packet Loss Trust-based Algorithms and Decision Procedures for Power Management Systems |