Hidden Conditional Neural Fields for Continuous Phoneme Speech Recognition

Yasuhisa FUJII  Kazumasa YAMAMOTO  Seiichi NAKAGAWA  

IEICE TRANSACTIONS on Information and Systems   Vol.E95-D   No.8   pp.2094-2104
Publication Date: 2012/08/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.E95.D.2094
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Speech and Hearing
hidden conditional neural fields,  hidden conditional random fields,  hidden Markov model,  speech recognition,  deep learning,  

Full Text: FreePDF(324.9KB)

In this paper, we propose Hidden Conditional Neural Fields (HCNF) for continuous phoneme speech recognition, which are a combination of Hidden Conditional Random Fields (HCRF) and a Multi-Layer Perceptron (MLP), and inherit their merits, namely, the discriminative property for sequences from HCRF and the ability to extract non-linear features from an MLP. HCNF can incorporate many types of features from which non-linear features can be extracted, and is trained by sequential criteria. We first present the formulation of HCNF and then examine three methods to further improve automatic speech recognition using HCNF, which is an objective function that explicitly considers training errors, provides a hierarchical tandem-style feature and includes a deep non-linear feature extractor for the observation function. We show that HCNF can be trained realistically without any initial model and outperforms HCRF and the triphone hidden Markov model trained by the minimum phone error (MPE) manner using experimental results for continuous English phoneme recognition on the TIMIT core test set and Japanese phoneme recognition on the IPA 100 test set.