Tuning GridFTP Pipelining, Concurrency and Parallelism Based on Historical Data

Jangyoung KIM  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E97-D   No.11   pp.2963-2966
Publication Date: 2014/11/01
Publicized: 2014/07/28
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2014EDL8104
Type of Manuscript: LETTER
Category: Information Network
Keyword: 
big data,  throughput optimization,  throughput estimation,  pipelining,  concurrency,  parallelism,  

Full Text: PDF(342KB)>>
Buy this Article




Summary: 
This paper presents a prediction model based on historical data to achieve optimal values of pipelining, concurrency and parallelism (PCP) in GridFTP data transfers in Cloud systems. Setting the correct values for these three parameters is crucial in achieving high throughput in end-to-end data movement. However, predicting and setting the optimal values for these parameters is a challenging task, especially in shared and non-predictive network conditions. Several factors can affect the optimal values for these parameters such as the background network traffic, available bandwidth, Round-Trip Time (RTT), TCP buffer size, and file size. Existing models either fail to provide accurate predictions or come with very high prediction overheads. The author shows that new model based on historical data can achieve high accuracy with low overhead.