Evidence-Based Context-Aware Log Data Management for Integrated Monitoring System

Tatsuya SATO  Yosuke HIMURA  Yoshiko YASUDA  

IEICE TRANSACTIONS on Communications   Vol.E101-B   No.9   pp.1997-2006
Publication Date: 2018/09/01
Online ISSN: 1745-1345
DOI: 10.1587/transcom.2017EBP3396
Type of Manuscript: PAPER
Category: Network Management/Operation
SaaS,  monitoring,  log data management,  

Full Text: PDF(3.1MB)
>>Buy this Article

Managing SaaS systems requires administrators to monitor and analyze diverse types of log data collected from a variety of components such as applications and IT resources. Integrated monitoring systems, enabled with datastore capable of storing and query-based processing of semi-structured data (e.g., NOSQL - some specific document database), is a promising solution that can store and query any type of log data with a single unified set of management panes. However, due to the increasing scale of SaaS systems and their long service lives, integrated monitoring systems have faced the problems in response times of log analysis and storage consumption for logs. In this present work, we solve the problems by developing an efficient log management method for SaaS systems. Our empirical observation is that the problems are primarily derived from the unselective log processing of datastore, whereas there should be heterogeneities in log data that we can take advantage of for efficient log management. Based on this observation, we first confirm this insight by investigating the usage patterns of log data in a quantitative manner with an actual dataset of log access histories obtained from a SaaS system serving tens of thousands of enterprise users over the course of more than 1.5 years. We show that there are heterogeneities in required retention period of logs, response time of log analysis, and amount of data, and the heterogeneities depend on log data category and its analysis scenario. Armed with the evidence of the heterogeneities in log data and the usage patterns found from the investigation, we design a methodology of context-aware log data management, key features of which are to speculatively pre-cache the result of log analysis and to proactively archive log data, depending on log data category and analysis scenario. Evaluation with a prototype implementation shows that the proposed method reduces the response time by 47% compared to a conventional method and the storage consumption by approximately 40% compared to the original log data.