Perceptually-Related F0 Parameters for Automatic Classification of Phrase Final Tones

Carlos Toshinori ISHI  

IEICE TRANSACTIONS on Information and Systems   Vol.E88-D   No.3   pp.481-488
Publication Date: 2005/03/01
Online ISSN: 
DOI: 10.1093/ietisy/e88-d.3.481
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Section on Corpus-Based Speech Technologies)
Category: Speech Synthesis and Prosody
phrase finals,  intonation,  pitch perception,  automatic labeling,  prosody,  

Full Text: PDF(942.3KB)>>
Buy this Article

Automatic labeling of prosodic features is an important topic when constructing large speech databases for speech synthesis or analysis purposes. Perceptually-related F0 parameters are proposed with the aim of automatically classifying phrase final tones. Analyses are conducted to verify how consistently subjects are able to categorize phrase final tones, and how perceptual features are related with the categories. Three types of acoustic parameters are proposed and analyzed for representing the perceptual features related to the tone categories: one related to pitch movement within the phrase final, one related to pitch reset prior to the phrase final, and one related to the length of the phrase final. A classification tree is constructed to evaluate automatic classification of phrase final tones, resulting in 79.2% accuracy for the consistently categorized samples, using the best combination among the proposed acoustic parameters.