For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
Error Models and Fault-Secure Scheduling in Multiprocessor Systems
Koji HASHIMOTO Tatsuhiro TSUCHIYA Tohru KIKUNO
IEICE TRANSACTIONS on Information and Systems
Publication Date: 2001/05/01
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Fault Tolerance
multiprocessors, fault-secure scheduling, task graphs, error models, tests,
Full Text: PDF(1.9MB)>>
A schedule for a parallel program is said to be 1-fault-secure if a system that uses the schedule can either produce correct output for the program or detect the presence of any faults in a single processor. Although several fault-secure scheduling algorithms have been proposed, they can all only be applied to a class of tree-structured task graphs with a uniform computation cost. Besides, they assume a stringent error model, called the redeemable error model, that considers extremely unlikely cases. In this paper, we first propose two new plausible error models which restrict the manner of error propagation. Then we present three fault-secure scheduling algorithms, one for each of the three models. Unlike previous algorithms, the proposed algorithms can deal with any task graphs with arbitrary computation and communication costs. Through experiments, we evaluate these algorithms and study the impact of the error models on the lengths of fault-secure schedules.