For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
Topic Document Model Approach for Naive Bayes Text Classification
Sang-Bum KIM Hae-Chang RIM Jin-Dong KIM
IEICE TRANSACTIONS on Information and Systems
Publication Date: 2005/05/01
Print ISSN: 0916-8532
Type of Manuscript: LETTER
Category: Natural Language Processing
text classification, naive Bayes,
Full Text: PDF>>
The multinomial naive Bayes model has been widely used for probabilistic text classification. However, the parameter estimation for this model sometimes generates inappropriate probabilities. In this paper, we propose a topic document model for the multinomial naive Bayes text classification, where the parameters are estimated from normalized term frequencies of each training document. Experiments are conducted on Reuters 21578 and 20 Newsgroup collections, and our proposed approach obtained a significant improvement in performance compared to the traditional multinomial naive Bayes.