For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
Controller/Precompiler for Portable Checkpointing
Gabriel RODRIGUEZ María J. MARTIN Patricia GONZALEZ Juan TOURIÑO
IEICE TRANSACTIONS on Information and Systems
Publication Date: 2006/02/01
Online ISSN: 1745-1361
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Section on Parallel/Distributed Computing and Networking)
Category: Parallel/Distributed Programming Models, Paradigms and Tools
parallel programming, fault tolerance, checkpointing, MPI,
Full Text: PDF>>
This paper presents CPPC (Controller/Precompiler for Portable Checkpointing), a checkpointing tool designed for heterogeneous clusters and Grid infrastructures through the use of portable protocols, portable checkpoint files and portable code. It works at variable level being user-directed, thus generating small checkpoint files. It allows parallel processes to checkpoint independently, without runtime coordination or message-logging. Consistency is achieved at restart time by negotiating the restart point. A directive-based checkpointing precompiler has also been implemented to ease up user's effort. CPPC was designed to work with parallel MPI programs, though it can be used with sequential ones, and easily extended to parallel programs written using different message-passing libraries, due to its highly modular design. Experimental results are shown using CPPC with different test applications.