Dependability Improvement for PPM Compressed Data by Using Compression Pattern Matching

Masato KITAKAMI  Toshihiro OKURA  

IEICE TRANSACTIONS on Information and Systems   Vol.E91-D   No.10   pp.2435-2439
Publication Date: 2008/10/01
Online ISSN: 1745-1361
DOI: 10.1093/ietisy/e91-d.10.2435
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Dependable Computing
dependability,  string match,  CPM (compression pattern matching),  computer virus scan,  PPM,  context,  

Full Text: PDF>>
Buy this Article

Data compression is popularly applied to computer systems and communication systems in order to reduce storage size and communication time, respectively. Since large data are used frequently, string matching for such data takes a long time. If the data are compressed, the time gets much longer because decompression is necessary. Long string matching time makes computer virus scan time longer and gives serious influence to the security of data. From this, CPM (Compression Pattern Matching) methods for several compression methods have been proposed. This paper proposes CPM method for PPM which achieves fast virus scan and improves dependability of the compressed data, where PPM is based on a Markov model, uses a context information, and achieves a better compression ratio than BW transform and Ziv-Lempel coding. The proposed method encodes the context information, which is generated in the compression process, and appends the encoded data at the beginning of the compressed data as a header. The proposed method uses only the header information. Computer simulation says that augmentation of the compression ratio is less than 5 percent if the order of the PPM is less than 5 and the source file size is more than 1 M bytes, where order is the maximum length of the context used in PPM compression. String matching time is independent of the source file size and is very short, less than 0.3 micro seconds in the PC used for the simulation.