An HMM State Duration Control Algorithm Applied to Large-Vocabulary Spontaneous Speech Recognition

Satoshi TAKAHASHI  Yasuhiro MINAMI  Kiyohiro SHIKANO  

IEICE TRANSACTIONS on Information and Systems   Vol.E78-D   No.6   pp.648-653
Publication Date: 1995/06/25
Online ISSN: 
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Issue on Spoken Language Processing)
HMM,  duration control,  trellis likelihood calculation,  large-vocabulary speech recognition,  

Full Text: PDF(588.2KB)>>
Buy this Article

Although Hidden Markov Modeling (HMM) is widely and successfully used in many speech recognition applications, duration control for HMMs is still an important issue in improving recognition accuracy since a HMM places no constraints on duration. For compensating this defect, some duration control algorithms that employ precise duration models have been proposed. However, they suffer from greatly increased computational complexity. This paper proposes a new state duration control algorithm for limiting both the maximum and the minimum state durations. The algorithm is for the HMM trellis likelihood calculation, not for the Viterbi calculation. The amount of computation required by this algorithm is only order one (O(1)) for the maximum state duration n; that is, the computation amount is independent of the maximum state duration while many conventional duration control algorithm require computation in the amount of order n or order n2. Thus, the algorithm can drastically reduce the computation needed for duration control. The algorithm uses the property that the trellis likelihood calculation is a summation of many path likelihoods. At each frame, the path likelihood that exceeds the maximum likelihood is subtracted, and the path likelihood that satisfies the minimum likelihood is added to the forward probability. By iterating this procedure, the algorithm calculates the trellis likelihood efficiently. The algorithm was evaluated using a large-vocabulary speaker-independent spontaneous speech recognition system for telephone directory assistance. The average reduction in error rate for sentence understanding was about 7% when using context-independent HMMs, and 3% when using context-dependent HMMs. We could confirm the improvement by using the proposed state duration control algorithm even though the maximum and the minimum state durations were not optimized for the task (speaker-independent duration settings obtained from a different task were used).