Building an Effective Speech Corpus by Utilizing Statistical Multidimensional Scaling Method

Goshu NAGINO  Makoto SHOZAKAI  Tomoki TODA  Hiroshi SARUWATARI  Kiyohiro SHIKANO  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E91-D   No.3   pp.607-614
Publication Date: 2008/03/01
Online ISSN: 1745-1361
DOI: 10.1093/ietisy/e91-d.3.607
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Section on Robust Speech Processing in Realistic Environments)
Category: Corpus
Keyword: 
speech corpus,  cost effective,  speaker selection,  acoustic model,  statistical MDS method,  

Full Text: PDF>>
Buy this Article




Summary: 
This paper proposes a technique for building an effective speech corpus with lower cost by utilizing a statistical multidimensional scaling method. The statistical multidimensional scaling method visualizes multiple HMM acoustic models into two-dimensional space. At first, a small number of voice samples per speaker is collected; speaker adapted acoustic models trained with collected utterances, are mapped into two-dimensional space by utilizing the statistical multidimensional scaling method. Next, speakers located in the periphery of the distribution, in a plotted map are selected; a speech corpus is built by collecting enough voice samples for the selected speakers. In an experiment for building an isolated-word speech corpus, the performance of an acoustic model trained with 200 selected speakers was equivalent to that of an acoustic model trained with 533 non-selected speakers. It means that a cost reduction of more than 62% was achieved. In an experiment for building a continuous word speech corpus, the performance of an acoustic model trained with 500 selected speakers was equivalent to that of an acoustic model trained with 1179 non-selected speakers. It means that a cost reduction of more than 57% was achieved.