For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
Avoiding Performance Impacts by Re-Replication Workload Shifting in HDFS Based Cloud Storage
Thanda SHWE Masayoshi ARITSUGI
IEICE TRANSACTIONS on Information and Systems
Publication Date: 2018/12/01
Online ISSN: 1745-1361
Type of Manuscript: Special Section PAPER (Special Section on Parallel and Distributed Computing and Networking)
Category: Cloud Computing
re-replication, fault tolerance, data reliability, HDFS,
Full Text: PDF>>
Data replication in cloud storage systems brings a lot of benefits, such as fault tolerance, data availability, data locality and load balancing both from reliability and performance perspectives. However, each time a datanode fails, data blocks stored on the failed datanode must be restored to maintain replication level. This may be a large burden for the system in which resources are highly utilized with users' application workloads. Although there have been many proposals for replication, the approach of re-replication has not been properly addressed yet. In this paper, we present a deferred re-replication algorithm to dynamically shift the re-replication workload based on current resource utilization status of the system. As workload pattern varies depending on the time of the day, simulation results from synthetic workload demonstrate a large opportunity for minimizing impacts on users' application workloads with the simple algorithm that adjusts re-replication based on current resource utilization. Our approach can reduce performance impacts on users' application workloads while ensuring the same reliability level as default HDFS can provide.