An Accurate Packer Identification Method Using Support Vector Machine

Ryoichi ISAWA  Tao BAN  Shanqing GUO  Daisuke INOUE  Koji NAKAO  

Publication
IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences   Vol.E97-A   No.1   pp.253-263
Publication Date: 2014/01/01
Online ISSN: 1745-1337
DOI: 10.1587/transfun.E97.A.253
Print ISSN: 0916-8508
Type of Manuscript: Special Section PAPER (Special Section on Cryptography and Information Security)
Category: Foundations
Keyword: 
malware analysis,  pack,  unpack,  machine learning,  SVM,  

Full Text: PDF(1.4MB)>>
Buy this Article




Summary: 
PEiD is a packer identification tool widely used for malware analysis but its accuracy is becoming lower and lower recently. There exist two major reasons for that. The first is that PEiD does not provide a way to create signatures, though it adopts a signature-based approach. We need to create signatures manually, and it is difficult to catch up with packers created or upgraded rapidly. The second is that PEiD utilizes exact matching. If a signature contains any error, PEiD cannot identify the packer that corresponds to the signature. In this paper, we propose a new automated packer identification method to overcome the limitations of PEiD and report the results of our numerical study. Our method applies string-kernel-based support vector machine (SVM): it can measure the similarity between packed programs without our operations such as manually creating signature and it provides some error tolerant mechanism that can significantly reduce detection failure caused by minor signature violations. In addition, we use the byte sequence starting from the entry point of a packed program as a packer's feature given to SVM. That is, our method combines the advantages from signature-based approach and machine learning (ML) based approach. The numerical results on 3902 samples with 26 packer classes and 3 unpacked (not-packed) classes shows that our method achieves a high accuracy of 99.46% outperforming PEiD and an existing ML-based method that Sun et al. have proposed.