Detecting Hijacked Sites by Web Spammer Using Link-Based Algorithms

Young-joo CHUNG  Masashi TOYODA  Masaru KITSUREGAWA  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E93-D   No.6   pp.1414-1421
Publication Date: 2010/06/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.E93.D.1414
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Section on Info-Plosion)
Category: Information Retrieval
Keyword: 
link analysis,  web spam,  information retrieval,  link hijacking,  

Full Text: PDF(209.2KB)>>
Buy this Article




Summary: 
In this paper, we propose a method for finding web sites whose links are hijacked by web spammers. A hijacked site is a trustworthy site that points to untrustworthy sites. To detect hijacked sites, we evaluate the trustworthiness of web sites, and examine how trustworthy sites are hijacked by untrustworthy sites in their out-neighbors. The trustworthiness is evaluated based on the difference between the white and spam scores that calculated by two modified versions of PageRank. We define two hijacked scores that measure how likely a trustworthy site is to be hijacked based on the distribution of the trustworthiness in its out-neighbors. The performance of those hijacked scores are compared using our large-scale Japanese Web archive. The results show that a better performance is obtained by the score that considers both trustworthy and untrustworthy out-neighbors, compared with the one that only considers untrustworthy out-neighbors.