For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
Short Text Classification Based on Distributional Representations of Words
Chenglong MA Qingwei ZHAO Jielin PAN Yonghong YAN
IEICE TRANSACTIONS on Information and Systems
Publication Date: 2016/10/01
Online ISSN: 1745-1361
Type of Manuscript: Special Section LETTER (Special Section on Recent Advances in Machine Learning for Spoken Language Processing)
Category: Text classification
short text classification, word embedding, gaussian model,
Full Text: PDF>>
Short texts usually encounter the problem of data sparseness, as they do not provide sufficient term co-occurrence information. In this paper, we show how to mitigate the problem in short text classification through word embeddings. We assume that a short text document is a specific sample of one distribution in a Gaussian-Bayesian framework. Furthermore, a fast clustering algorithm is utilized to expand and enrich the context of short text in embedding space. This approach is compared with those based on the classical bag-of-words approaches and neural network based methods. Experimental results validate the effectiveness of the proposed method.