|
For Full-Text PDF, please login, if you are a member of IEICE,
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
|
High-Performance End-to-End Integrity Verification on Big Data Transfer
Eun-Sung JUNG Si LIU Rajkumar KETTIMUTHU Sungwook CHUNG
Publication
IEICE TRANSACTIONS on Information and Systems
Vol.E102-D
No.8
pp.1478-1488 Publication Date: 2019/08/01 Publicized: 2019/04/24 Online ISSN: 1745-1361
DOI: 10.1587/transinf.2018EDP7297 Type of Manuscript: PAPER Category: Fundamentals of Information Systems Keyword: high-performance data transfer, IoT-based big data, data integrity, pipelining,
Full Text: PDF(870.1KB)>>
Summary:
The scale of scientific data generated by experimental facilities and simulations in high-performance computing facilities has been proliferating with the emergence of IoT-based big data. In many cases, this data must be transmitted rapidly and reliably to remote facilities for storage, analysis, or sharing, for the Internet of Things (IoT) applications. Simultaneously, IoT data can be verified using a checksum after the data has been written to the disk at the destination to ensure its integrity. However, this end-to-end integrity verification inevitably creates overheads (extra disk I/O and more computation). Thus, the overall data transfer time increases. In this article, we evaluate strategies to maximize the overlap between data transfer and checksum computation for astronomical observation data. Specifically, we examine file-level and block-level (with various block sizes) pipelining to overlap data transfer and checksum computation. We analyze these pipelining approaches in the context of GridFTP, a widely used protocol for scientific data transfers. Theoretical analysis and experiments are conducted to evaluate our methods. The results show that block-level pipelining is effective in maximizing the overlap mentioned above, and can improve the overall data transfer time with end-to-end integrity verification by up to 70% compared to the sequential execution of transfer and checksum, and by up to 60% compared to file-level pipelining.
|
|