A Comparison of Correlated Failures for Software Using Community Error Recovery and Software Breeding

Kazuyuki SHIMA  Ken-ichi MATSUMOTO  Koji TORII  

IEICE TRANSACTIONS on Information and Systems   Vol.E80-D   No.7   pp.717-725
Publication Date: 1997/07/25
Online ISSN: 
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Fault Tolerant Computing
correlated failure,  community error recovery,  fault tolerance,  N-version programming,  software reliability,  

Full Text: PDF(691.3KB)>>
Buy this Article

We present a comparison of correlated failures for multiversion software using community error recovery (CER) and software breeding (SB). In CER, errors are detected and recovered at checkpoints which are inserted in all the versions of the software. SB is analogous to the breeding of plants and animals. In SB, versions consist of loadable modules, and a driver exchanges the modules between versions to detect and eliminate faulty modules. We formulate reliability models to estimate the probability of failure for software using either CER or SB. Our reliability models assume failures in the checkpoints in CER and the driver in SB. We use beta-binomial distribution for modeling correlated failures of versions, because much of the evidence suggests that the assumption that failures in versions occur independently is not always true. Our comparison indicates that multiversion software using SB is more reliable than that using CER when the probability of failure in the checkpoints in CER or the driver in SB is 10-7.