Topic Document Model Approach for Naive Bayes Text Classification

Sang-Bum KIM  Hae-Chang RIM  Jin-Dong KIM  

IEICE TRANSACTIONS on Information and Systems   Vol.E88-D   No.5   pp.1091-1094
Publication Date: 2005/05/01
Online ISSN: 
DOI: 10.1093/ietisy/e88-d.5.1091
Print ISSN: 0916-8532
Type of Manuscript: LETTER
Category: Natural Language Processing
text classification,  naive Bayes,  

Full Text: PDF>>
Buy this Article

The multinomial naive Bayes model has been widely used for probabilistic text classification. However, the parameter estimation for this model sometimes generates inappropriate probabilities. In this paper, we propose a topic document model for the multinomial naive Bayes text classification, where the parameters are estimated from normalized term frequencies of each training document. Experiments are conducted on Reuters 21578 and 20 Newsgroup collections, and our proposed approach obtained a significant improvement in performance compared to the traditional multinomial naive Bayes.