Job-Aware File-Storage Optimization for Improved Hadoop I/O Performance


IEICE TRANSACTIONS on Information and Systems   Vol.E103-D    No.10    pp.2083-2093
Publication Date: 2020/10/01
Publicized: 2020/06/30
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2019EDP7337
Type of Manuscript: PAPER
Category: Software System
Hadoop,  MapReduce,  SWIM,  file system,  

Full Text: PDF>>
Buy this Article

Hadoop is a popular data-analytics platform based on Google's MapReduce programming model. Hard-disk drives (HDDs) are generally used in big-data analysis, and the effectiveness of the Hadoop platform can be optimized by enhancing its I/O performance. HDD performance varies depending on whether the data are stored in the inner or outer disk zones. This paper proposes a method that utilizes the knowledge of job characteristics to realize efficient data storage in HDDs, which in turn, helps improve Hadoop performance. Per the proposed method, job files that need to be frequently accessed are stored in outer disk tracks which are capable of facilitating sequential-access speeds that are higher than those provided by inner tracks. Thus, the proposed method stores temporary and permanent files in the outer and inner zones, respectively, thereby facilitating fast access to frequently required data. Results of performance evaluation demonstrate that the proposed method improves Hadoop performance by 15.4% when compared to normal cases when file placement is not used. Additionally, the proposed method outperforms a previously proposed placement approach by 11.1%.