Controller/Precompiler for Portable Checkpointing

Gabriel RODRIGUEZ  María J. MARTIN  Patricia GONZALEZ  Juan TOURIÑO  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E89-D   No.2   pp.408-417
Publication Date: 2006/02/01
Online ISSN: 1745-1361
DOI: 10.1093/ietisy/e89-d.2.408
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Section on Parallel/Distributed Computing and Networking)
Category: Parallel/Distributed Programming Models, Paradigms and Tools
Keyword: 
parallel programming,  fault tolerance,  checkpointing,  MPI,  

Full Text: PDF>>
Buy this Article




Summary: 
This paper presents CPPC (Controller/Precompiler for Portable Checkpointing), a checkpointing tool designed for heterogeneous clusters and Grid infrastructures through the use of portable protocols, portable checkpoint files and portable code. It works at variable level being user-directed, thus generating small checkpoint files. It allows parallel processes to checkpoint independently, without runtime coordination or message-logging. Consistency is achieved at restart time by negotiating the restart point. A directive-based checkpointing precompiler has also been implemented to ease up user's effort. CPPC was designed to work with parallel MPI programs, though it can be used with sequential ones, and easily extended to parallel programs written using different message-passing libraries, due to its highly modular design. Experimental results are shown using CPPC with different test applications.