2PTS: A Two-Phase Task Scheduling Algorithm for MapReduce

Byungnam LIM  Yeeun SHIM  Yon Dohn CHUNG  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E99-D   No.9   pp.2377-2380
Publication Date: 2016/09/01
Publicized: 2016/06/06
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2016EDL8075
Type of Manuscript: LETTER
Category: Fundamentals of Information Systems
Keyword: 
MapReduce,  task scheduling algorithm,  data locality,  

Full Text: PDF(220.6KB)>>
Buy this Article




Summary: 
For an efficient processing of large data in a distributed system, Hadoop MapReduce performs task scheduling such that tasks are distributed with consideration of the data locality. The data locality, however, is limitedly exploited, since it is pursued one node at a time basis without considering the global optimality. In this paper, we propose a novel task scheduling algorithm that globally considers the data locality. Through experiments, we show our algorithm improves the performance of MapReduce in various situations.