Short Text Classification Based on Distributional Representations of Words

Chenglong MA  Qingwei ZHAO  Jielin PAN  Yonghong YAN  

IEICE TRANSACTIONS on Information and Systems   Vol.E99-D   No.10   pp.2562-2565
Publication Date: 2016/10/01
Publicized: 2016/07/19
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2016SLL0006
Type of Manuscript: Special Section LETTER (Special Section on Recent Advances in Machine Learning for Spoken Language Processing)
Category: Text classification
short text classification,  word embedding,  gaussian model,  

Full Text: PDF>>
Buy this Article

Short texts usually encounter the problem of data sparseness, as they do not provide sufficient term co-occurrence information. In this paper, we show how to mitigate the problem in short text classification through word embeddings. We assume that a short text document is a specific sample of one distribution in a Gaussian-Bayesian framework. Furthermore, a fast clustering algorithm is utilized to expand and enrich the context of short text in embedding space. This approach is compared with those based on the classical bag-of-words approaches and neural network based methods. Experimental results validate the effectiveness of the proposed method.