Lightweight Consistent Recovery Algorithm for Sender-Based Message Logging in Distributed Systems

Jinho AHN  

IEICE TRANSACTIONS on Information and Systems   Vol.E94-D   No.8   pp.1712-1715
Publication Date: 2011/08/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.E94.D.1712
Print ISSN: 0916-8532
Type of Manuscript: LETTER
Category: Dependable Computing
distributed systems,  fault-tolerance,  message logging,  checkpointing,  scalability,  consistent recovery,  

Full Text: PDF>>
Buy this Article

Sender-based message logging (SBML) with checkpointing has its well-known beneficial feature, lowering highly failure-free overhead of synchronous logging with volatile logging at sender's memory. This feature encourages it to be applied into many distributed systems as a low-cost transparent rollback recovery technique. However, the original SBML recovery algorithm may no longer be progressing in some transient communication error cases. This paper proposes a consistent recovery algorithm to solve this problem by piggybacking small log information for unstable messages received on each acknowledgement message for returning the receive sequence number assigned to a message by its receiver. Our algorithm also enables all messages scheduled to be sent, but delayed because of some preceding unstable messages to be actually transmitted out much earlier than the existing ones.