Distinctive Phonetic Feature (DPF) Extraction Based on MLNs and Inhibition/Enhancement Network

Mohammad Nurul HUDA  Hiroaki KAWASHIMA  Tsuneo NITTA  

IEICE TRANSACTIONS on Information and Systems   Vol.E92-D   No.4   pp.671-680
Publication Date: 2009/04/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.E92.D.671
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Speech and Hearing
distinctive phonetic feature,  hidden Markov model,  multilayer neural network,  inhibition/enhancement network,  local features,  

Full Text: PDF>>
Buy this Article

This paper describes a distinctive phonetic feature (DPF) extraction method for use in a phoneme recognition system; our method has a low computation cost. This method comprises three stages. The first stage uses two multilayer neural networks (MLNs): MLNLF-DPF, which maps continuous acoustic features, or local features (LFs), onto discrete DPF features, and MLNDyn, which constrains the DPF context at the phoneme boundaries. The second stage incorporates inhibition/enhancement (In/En) functionalities to discriminate whether the DPF dynamic patterns of trajectories are convex or concave, where convex patterns are enhanced and concave patterns are inhibited. The third stage decorrelates the DPF vectors using the Gram-Schmidt orthogonalization procedure before feeding them into a hidden Markov model (HMM)-based classifier. In an experiment on Japanese Newspaper Article Sentences (JNAS) utterances, the proposed feature extractor, which incorporates two MLNs and an In/En network, was found to provide a higher phoneme correct rate with fewer mixture components in the HMMs.