Identification of Smallest Unacceptable Combinations of Simultaneous Component Failures in Information Systems

Kumiko TADANO  Jianwen XIANG  Fumio MACHIDA  Yoshiharu MAENO  

IEICE TRANSACTIONS on Information and Systems   Vol.E96-D   No.9   pp.1941-1951
Publication Date: 2013/09/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.E96.D.1941
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Section on Dependable Computing)
automatic model synthesis,  recovery operation procedure,  stochastic reward net (SRN),  time to recover (TTR),  

Full Text: PDF>>
Buy this Article

Large-scale disasters may cause simultaneous failures of many components in information systems. In the design for disaster recovery, operational procedures to recover from simultaneous component failures need to be determined so as to satisfy the time-to-recovery objective within the limited budget. For this purpose, it is beneficial to identify the smallest unacceptable combination of component failures (SUCCF) which exceeds the acceptable cost for recovering the system. This allows us to know the limitation of the recovery capability of the designed recovery operation procedure. In this paper, we propose a technique to identify the SUCCF by predicting the required cost for recovery from each combination of component failures with and without two-person cross-check of execution of recovery operations. We synthesize analytic models from the description of recovery operation procedure in the form of SysML Activity Diagram, and solve the models to predict the time-to-recovery and the cost. An example recovery operation procedure for a commercial database management system is used to demonstrate the proposed technique.