A Proposal of Event Correlation for Distributed Network Fault Management and Its Evaluation

Nei KATO  Kohei OHTA  Tomohiro IKA  Glenn MANSFIELD  Yoshiaki NEMOTO  

Publication
IEICE TRANSACTIONS on Communications   Vol.E82-B   No.6   pp.859-867
Publication Date: 1999/06/25
Online ISSN: 
DOI: 
Print ISSN: 0916-8516
Type of Manuscript: Special Section PAPER (Special Issue on Distributed Processing for Controlling Telecommunications Systems)
Category: 
Keyword: 
event correlation,  distributed network management,  Network Management Clock (NMC),  

Full Text: PDF(874.1KB)
>>Buy this Article


Summary: 
In a distributed network management environment, a NMS (Network Management Station) interacts with several agents in different sub-networks. In the network fault management context, the NMS detects symptoms that indicate some abnormality e. g. a surge in ICMP traffic, which may be caused by some network malfunction or misuse. The occurrence of a symptom is an event. Large number of events may be detected by an NMS. The sheer number of these events makes it difficult, if not impossible, for an NMS to diagnose these events. Generally, a fault may have a cascading effect which may, in turn, give rise to a very large number of events. The sequence of events and their correlation play an important role in fault management and diagnosis. In the distributed environment of todays networks, the absence of any uniform time for reference makes this a challenging task. In the present network management framework of SNMP, a Manager maintains a notion of the clock of the agent it interacts with. But this mechanism is inadequate to determine the sequence of events and their correlation, more so, in a distributed environment which may involve several managers. In this paper we propose a mechanism for ordering and correlating events detected in large-scale network which is managed in a distributed manner within the SNMP framework. Our algorithm uses the concept of a Network Management Clock (NMC). The NMC is a virtual clock maintained by a manager based on sysUpTime readings from each SNMP agent. In this paper, the algorithm, its implementation and evaluation will be discussed.