A Malicious Web Site Identification Technique Using Web Structure Clustering

Tatsuya NAGAI  Masaki KAMIZONO  Yoshiaki SHIRAISHI  Kelin XIA  Masami MOHRI  Yasuhiro TAKANO  Masakatu MORII  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E102-D   No.9   pp.1665-1672
Publication Date: 2019/09/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2018OFP0010
Type of Manuscript: Special Section PAPER (Special Section on Log Data Usage Technology and Office Information Systems)
Category: Cybersecurity
Keyword: 
website structure,  malicious website,  exploit kit,  clustering,  

Full Text: PDF(1.6MB)>>
Buy this Article




Summary: 
Epidemic cyber incidents are caused by malicious websites using exploit kits. The exploit kit facilitate attackers to perform the drive-by download (DBD) attack. However, it is reported that malicious websites using an exploit kit have similarity in their website structure (WS)-trees. Hence, malicious website identification techniques leveraging WS-trees have been studied, where the WS-trees can be estimated from HTTP traffic data. Nevertheless, the defensive component of the exploit kit prevents us from capturing the WS-tree perfectly. This paper shows, hence, a new WS-tree construction procedure by using the fact that a DBD attack happens in a certain duration. This paper proposes, moreover, a new malicious website identification technique by clustering the WS-tree of the exploit kits. Experiment results assuming the D3M dataset verify that the proposed technique identifies exploit kits with a reasonable accuracy even when HTTP traffic from the malicious sites are partially lost.