Group-to-Group Communications for Fault-Tolerance in Distributed Systems

Hiroaki HIGAKI  Terunao SONEOKA  

IEICE TRANSACTIONS on Information and Systems   Vol.E76-D   No.11   pp.1348-1357
Publication Date: 1993/11/25
Online ISSN: 
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Issue on Responsive Computer Systems)
fault-tolerance,  distributed systems,  process group,  group communications,  

Full Text: PDF(907.4KB)>>
Buy this Article

This paper proposes a group-to-group communications algorithm that can extend the range of distributed systems where we can achieve active replication fault-tolerance to partner model distributed systems, in which all processes communicate with each other on an equal footing. Active replication approach, in which all replicated processes are active, can achieve fault-tolerance with low overhead because checkhpoint setting and rollback are not required for recovery from process failure. This algorithm guarantees that each replicated process in a process group has the same execution history and that communications between process groups keeps consistency even in the presence of process failure and message loss. The number of control messages that must be transmitted between processes for a communication between process groups is only a linear order of the number of replicated processes in each process group. Furthemore, this algorithm reduces the overhead for reconfiguration of a process group by keeping process failure and recovery information local to each process group.