Robust Speaker Identification System Based on Multilayer Eigen-Codebook Vector Quantization

Ching-Tang HSIEH  Eugene LAI  Wan-Chen CHEN  

IEICE TRANSACTIONS on Information and Systems   Vol.E87-D   No.5   pp.1185-1193
Publication Date: 2004/05/01
Online ISSN: 
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Section on Speech Dynamics by Ear, Eye, Mouth and Machine)
speaker identification,  vector quantization,  eigenvector,  principal component analysis,  multilayer eigen-codebook vector quantization (MLECVQ),  

Full Text: PDF>>
Buy this Article

This paper presents some effective methods for improving the performance of a speaker identification system. Based on the multiresolution property of the wavelet transform, the input speech signal is decomposed into various frequency subbands in order not to spread noise distortions over the entire feature space. For capturing the characteristics of the vocal tract, the linear predictive cepstral coefficients (LPCC) of the lower frequency subband for each decomposition process are calculated. In addition, a hard threshold technique for the lower frequency subband in each decomposition process is also applied to eliminate the effect of noise interference. Furthermore, cepstral domain feature vector normalization is applied to all computed features in order to provide similar parameter statistics in all acoustic environments. In order to effectively utilize all these multiband speech features, we propose a modified vector quantization as the identifier. This model uses the multilayer concept to eliminate the interference among the multiband speech features and then uses the principal component analysis (PCA) method to evaluate the codebooks for capturing a more detailed distribution of the speaker's phoneme characteristics. The proposed method is evaluated using the KING speech database for text-independent speaker identification. Experimental results show that the recognition performance of the proposed method is better than those of the vector quantization (VQ) and the Gaussian mixture model (GMM) using full-band LPCC and mel-frequency cepstral coefficients (MFCC) features in both clean and noisy environments. Also, a satisfactory performance can be achieved in low SNR environments.