For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
Unsupervised Learning of Continuous Density HMM for Variable-Length Spoken Unit Discovery
Meng SUN Hugo VAN HAMME Yimin WANG Xiongwei ZHANG
IEICE TRANSACTIONS on Information and Systems
Publication Date: 2016/01/01
Online ISSN: 1745-1361
Type of Manuscript: LETTER
Category: Speech and Hearing
spoken unit discovery, unsupervised HMM learning, nonnegative matrix factorization, language acquisition,
Full Text: PDF(199.5KB)>>
Unsupervised spoken unit discovery or zero-source speech recognition is an emerging research topic which is important for spoken document analysis of languages or dialects with little human annotation. In this paper, we extend our earlier joint training framework for unsupervised learning of discrete density HMM to continuous density HMM (CDHMM) and apply it to spoken unit discovery. In the proposed recipe, we first cluster a group of Gaussians which then act as initializations to the joint training framework of nonnegative matrix factorization and semi-continuous density HMM (SCDHMM). In SCDHMM, all the hidden states share the same group of Gaussians but with different mixture weights. A CDHMM is subsequently constructed by tying the top-N activated Gaussians to each hidden state. Baum-Welch training is finally conducted to update the parameters of the Gaussians, mixture weights and HMM transition probabilities. Experiments were conducted on word discovery from TIDIGITS and phone discovery from TIMIT. For TIDIGITS, units were modeled by 10 states which turn out to be strongly related to words; while for TIMIT, units were modeled by 3 states which are likely to be phonemes.