Density-Based Spam Detector

Kenichi YOSHIDA  Fuminori ADACHI  Takashi WASHIO  Hiroshi MOTODA  Teruaki HOMMA  Akihiro NAKASHIMA  Hiromitsu FUJIKAWA  Katsuyuki YAMAZAKI  

IEICE TRANSACTIONS on Information and Systems   Vol.E87-D   No.12   pp.2678-2688
Publication Date: 2004/12/01
Online ISSN: 
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Section on New Technologies and their Applications of the Internet)
Category: Internet Systems
spam,  unsupervised learning,  document space density,  direct-mapped cache,  

Full Text: PDF>>
Buy this Article

The volume of mass unsolicited electronic mail, often known as spam, has recently increased enormously and has become a serious threat not only to the Internet but also to society. This paper proposes a new spam detection method which uses document space density information. Although the proposed method requires extensive e-mail traffic to acquire the necessary information, it can achieve perfect detection (i.e., both recall and precision is 100%) under practical conditions. A direct-mapped cache method contributes to the handling of over 13,000 e-mail messages per second. Experimental results, which were conducted using over 50 million actual e-mail messages, are also reported in this paper.