A Malicious Web Site Identification Technique Using Web Structure Clustering

Tatsuya NAGAI  Masaki KAMIZONO  Yoshiaki SHIRAISHI  Kelin XIA  Masami MOHRI  Yasuhiro TAKANO  Masakatu MORII  

IEICE TRANSACTIONS on Information and Systems   Vol.E102-D   No.9   pp.1665-1672
Publication Date: 2019/09/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2018OFP0010
Type of Manuscript: Special Section PAPER (Special Section on Log Data Usage Technology and Office Information Systems)
Category: Cybersecurity
website structure,  malicious website,  exploit kit,  clustering,  

Full Text: PDF>>
Buy this Article

Epidemic cyber incidents are caused by malicious websites using exploit kits. The exploit kit facilitate attackers to perform the drive-by download (DBD) attack. However, it is reported that malicious websites using an exploit kit have similarity in their website structure (WS)-trees. Hence, malicious website identification techniques leveraging WS-trees have been studied, where the WS-trees can be estimated from HTTP traffic data. Nevertheless, the defensive component of the exploit kit prevents us from capturing the WS-tree perfectly. This paper shows, hence, a new WS-tree construction procedure by using the fact that a DBD attack happens in a certain duration. This paper proposes, moreover, a new malicious website identification technique by clustering the WS-tree of the exploit kits. Experiment results assuming the D3M dataset verify that the proposed technique identifies exploit kits with a reasonable accuracy even when HTTP traffic from the malicious sites are partially lost.