NDCouplingHDFS: A Coupling Architecture for a Power-Proportional Hadoop Distributed File System

Hieu Hanh LE  Satoshi HIKIDA  Haruo YOKOTA  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E97-D   No.2   pp.213-222
Publication Date: 2014/02/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.E97.D.213
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Data Engineering, Web Information Systems
Keyword: 
power-proportionality,  HDFS,  metadata management,  

Full Text: PDF>>
Buy this Article




Summary: 
Energy-aware distributed file systems are increasingly moving toward power-proportional designs. However, current works have not considered the cost of updating data sets that were modified in a low-power mode, where a subset of nodes were powered off. In detail, when the system moves to a high-power mode, it must internally replicate the updated data to the reactivated nodes. Effectively reflecting the updated data is vital in making a distributed file system, such as the Hadoop Distributed File System (HDFS), power proportional. In the current HDFS design, when the system changes power mode, the block replication process is ineffectively restrained by a single NameNode because of access congestion of the metadata information of blocks. This paper presents a novel architecture, a NameNode and DataNode Coupling Hadoop Distributed File System (NDCouplingHDFS), which effectively reflects the updated blocks when the system goes into high-power mode. This is achieved by coupling metadata management and data management at each node to efficiently localize the range of blocks maintained by the metadata. Experiments using actual machines show that NDCouplingHDFS is able to significantly reduce the execution time required to move updated blocks by 46% relative to the normal HDFS. Moreover, NDCouplingHDFS is capable of increasing the throughput of the system supporting MapReduce by applying an index in metadata management.