To Get Lost is to Learn the Way: An Analysis of Multi-Step Social Engineering Attacks on the Web

Takashi KOIDE  Daiki CHIBA  Mitsuaki AKIYAMA  Katsunari YOSHIOKA  Tsutomu MATSUMOTO  

Publication
IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences   Vol.E104-A   No.1   pp.162-181
Publication Date: 2021/01/01
Online ISSN: 1745-1337
DOI: 10.1587/transfun.2020CIP0005
Type of Manuscript: Special Section PAPER (Special Section on Cryptography and Information Security)
Category: 
Keyword: 
social engineering attacks,  browser automation,  web crawler,  

Full Text: FreePDF(2.8MB)


Summary: 
Web-based social engineering (SE) attacks manipulate users to perform specific actions, such as downloading malware and exposing personal information. Aiming to effectively lure users, some SE attacks, which we call multi-step SE attacks, constitute a sequence of web pages starting from a landing page and require browser interactions at each web page. Also, different browser interactions executed on a web page often branch to multiple sequences to redirect users to different SE attacks. Although common systems analyze only landing pages or conduct browser interactions limited to a specific attack, little effort has been made to follow such sequences of web pages to collect multi-step SE attacks. We propose STRAYSHEEP, a system to automatically crawl a sequence of web pages and detect diverse multi-step SE attacks. We evaluate the effectiveness of STRAYSHEEP's three modules (landing-page-collection, web-crawling, and SE-detection) in terms of the rate of collected landing pages leading to SE attacks, efficiency of web crawling to reach more SE attacks, and accuracy in detecting the attacks. Our experimental results indicate that STRAYSHEEP can lead to 20% more SE attacks than Alexa top sites and search results of trend words, crawl five times more efficiently than a simple crawling module, and detect SE attacks with 95.5% accuracy. We demonstrate that STRAYSHEEP can collect various SE attacks, not limited to a specific attack. We also clarify attackers' techniques for tricking users and browser interactions, redirecting users to attacks.