that is, all replicas are active and compute concurrently a different piece of the application parallel code
For Full-Text PDF, please login, if you are a member of IEICE,
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
A Novel Replication Technique for Detecting and Masking Failures for Parallel Software: Active Parallel Replication
Adel CHERIF Masato SUZUKI Takuya KATAYAMA
IEICE TRANSACTIONS on Information and Systems Vol.E80-D No.9 pp.886-892
Publication Date: 1997/09/25
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Issue on Architectures, Algorithms and Networks for Massively Parallel Computing)
Category: Fault Tolerance
fault tolerant systems, distributed systems, parallel computing, functional paradigm, checkpointing and recovery,
Full Text: PDF>>
We present a novel replication technique for parallel applications where instances of the replicated application are active on different group of processors called replicas. The replication technique is based on the FTAG (Fault Tolerant Attribute Grammar) computation model. FTAG is a functional and attribute based model. The developed replication technique implements "active parallel replication," that is, all replicas are active and compute concurrently a different piece of the application parallel code. In our model replicas cooperate not only to detect and mask failures but also to perform parallel computation. The replication mechanisms are supported by FTAG run time system and are fully application-transparent. Different novel mechanisms for checkpointing and recovery are developed. In our model during rollback recovery only that part of the computation that was detected faulty is discarded. The replication technique takes full advantage of parallel computing to reduce overall computation time.