Effective Scheduling of Duplicated Tasks for Fault Tolerance in Multiprocessor Systems


IEICE TRANSACTIONS on Information and Systems   Vol.E85-D   No.3   pp.525-534
Publication Date: 2002/03/01
Online ISSN: 
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Fault Tolerance
multiprocessor,  fault-tolerant scheduling,  task graph,  heights,  task groups,  

Full Text: PDF(733.8KB)>>
Buy this Article

In this paper, we propose a new scheduling algorithm to achieve fault tolerance in multiprocessor systems. This algorithm first partitions a parallel program into subsets of tasks, based on the notion of height of a task graph. For each subset, the algorithm then duplicates and schedules the tasks in the subset successively. We prove that schedules obtained by the proposed algorithm can tolerate a single processor failure and show that the computational complexity of the algorithm is O(|V|4) where V is the set of nodes of a task graph. We conduct simulations by applying the algorithm to two kinds of practical task graphs (Gaussian elimination and LU-decomposition). The results of this experiment show that fault tolerance can be achieved at the cost of small degree of time redundancy, and that performance in the case of a processor failure is improved compared to a previous algorithm.