Skew-Tolerant Key Distribution for Load Balancing in MapReduce

Jihoon SON  Hyunsik CHOI  Yon Dohn CHUNG  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E95-D   No.2   pp.677-680
Publication Date: 2012/02/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.E95.D.677
Print ISSN: 0916-8532
Type of Manuscript: LETTER
Category: Data Engineering, Web Information Systems
Keyword: 
skew-tolerance,  MapReduce,  load balance,  key distribution,  

Full Text: PDF(301KB)>>
Buy this Article




Summary: 
MapReduce is a parallel processing framework for large scale data. In the reduce phase, MapReduce employs the hash scheme in order to distribute data sharing the same key across cluster nodes. However, this approach is not robust for the skewed data distribution. In this paper, we propose a skew-tolerant key distribution method for MapReduce. The proposed method assigns keys to cluster nodes balancing their workloads. We implemented our proposed method on Hadoop. Through experiments, we evaluate the performance of the proposed method in comparison with the conventional method.