Application of Feature Engineering for Phishing Detection

Wei ZHANG  Huan REN  Qingshan JIANG  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E99-D   No.4   pp.1062-1070
Publication Date: 2016/04/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2015CYP0005
Type of Manuscript: Special Section PAPER (Special Section on Cyberworlds)
Category: 
Keyword: 
phishing detection,  feature engineering,  feature selection,  feature extraction,  two-stage projection pursuit,  

Full Text: PDF>>
Buy this Article

 | Errata[Uploaded on May 1,2016]


Summary: 
Phishing attacks target financial returns by luring Internet users to exposure their sensitive information. Phishing originates from e-mail fraud, and recently it is also spread by social networks and short message service (SMS), which makes phishing become more widespread. Phishing attacks have drawn great attention due to their high volume and causing heavy losses, and many methods have been developed to fight against them. However, most of researches suffered low detection accuracy or high false positive (FP) rate, and phishing attacks are facing the Internet users continuously. In this paper, we are concerned about feature engineering for improving the classification performance on phishing web pages detection. We propose a novel anti-phishing framework that employs feature engineering including feature selection and feature extraction. First, we perform feature selection based on genetic algorithm (GA) to divide features into critical features and non-critical features. Then, the non-critical features are projected to a new feature by implementing feature extraction based on a two-stage projection pursuit (PP) algorithm. Finally, we take the critical features and the new feature as input data to construct the detection model. Our anti-phishing framework does not simply eliminate the non-critical features, but considers utilizing their projection in the process of classification, which is different from literatures. Experimental results show that the proposed framework is effective in detecting phishing web pages.