BotProfiler: Detecting Malware-Infected Hosts by Profiling Variability of Malicious Infrastructure

Daiki CHIBA  Takeshi YAGI  Mitsuaki AKIYAMA  Kazufumi AOKI  Takeo HARIU  Shigeki GOTO  

Publication
IEICE TRANSACTIONS on Communications   Vol.E99-B   No.5   pp.1012-1023
Publication Date: 2016/05/01
Online ISSN: 1745-1345
DOI: 10.1587/transcom.2015AMP0001
Type of Manuscript: Special Section PAPER (Special Section on Internet Architectures and Management Methods that Enable Flexible and Secure Deployment of Network Services)
Category: 
Keyword: 
malware,  botnet,  dynamic analysis,  template,  

Full Text: FreePDF


Summary: 
Ever-evolving malware makes it difficult to prevent it from infecting hosts. Botnets in particular are one of the most serious threats to cyber security, since they consist of a lot of malware-infected hosts. Many countermeasures against malware infection, such as generating network-based signatures or templates, have been investigated. Such templates are designed to introduce regular expressions to detect polymorphic attacks conducted by attackers. A potential problem with such templates, however, is that they sometimes falsely regard benign communications as malicious, resulting in false positives, due to an inherent aspect of regular expressions. Since the cost of responding to malware infection is quite high, the number of false positives should be kept to a minimum. Therefore, we propose a system to generate templates that cause fewer false positives than a conventional system in order to achieve more accurate detection of malware-infected hosts. We focused on the key idea that malicious infrastructures, such as malware samples or command and control, tend to be reused instead of created from scratch. Our research verifies this idea and proposes here a new system to profile the variability of substrings in HTTP requests, which makes it possible to identify invariable keywords based on the same malicious infrastructures and to generate more accurate templates. The results of implementing our system and validating it using real traffic data indicate that it reduced false positives by up to two-thirds compared to the conventional system and even increased the detection rate of infected hosts.