For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
Group-to-Group Communications for Fault-Tolerance in Distributed Systems
Hiroaki HIGAKI Terunao SONEOKA
IEICE TRANSACTIONS on Information and Systems
Publication Date: 1993/11/25
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Issue on Responsive Computer Systems)
fault-tolerance, distributed systems, process group, group communications,
Full Text: PDF(907.4KB)>>
This paper proposes a group-to-group communications algorithm that can extend the range of distributed systems where we can achieve active replication fault-tolerance to partner model distributed systems, in which all processes communicate with each other on an equal footing. Active replication approach, in which all replicated processes are active, can achieve fault-tolerance with low overhead because checkhpoint setting and rollback are not required for recovery from process failure. This algorithm guarantees that each replicated process in a process group has the same execution history and that communications between process groups keeps consistency even in the presence of process failure and message loss. The number of control messages that must be transmitted between processes for a communication between process groups is only a linear order of the number of replicated processes in each process group. Furthemore, this algorithm reduces the overhead for reconfiguration of a process group by keeping process failure and recovery information local to each process group.