A Portable Text-to-Speech System Using a Pocket-Sized Formant Speech Synthesizer

Norio HIGUCHI  Tohru SHIMIZU  Hisashi KAWAI  Seiichi YAMAMOTO  

Publication
IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences   Vol.E76-A   No.11   pp.1981-1989
Publication Date: 1993/11/25
Online ISSN: 
DOI: 
Print ISSN: 0916-8508
Type of Manuscript: Special Section PAPER (Special Section on Speech Synthesis: Current Technologies and Thier Application)
Category: 
Keyword: 
text-to-speech conversion,  determination of reading,  prosodic control,  speech synthesis by rule,  formant-type speech synthesizer,  

Full Text: PDF(823.2KB)>>
Buy this Article




Summary: 
The authors developed a portable Japanese text-to-speech system using a pocket-sized formant speech synthesizer. It consists of a linguistic processor and an acoustic processor. The linguistic processor runs on an MS-DOS personal computer and has functions to determine readings and prosodic information for input sentences written in kana-kanji-mixed style. New techniques, such as minimization of a cost function for phrases, rare-compound flag, semantic information, information of reading selection and restriction by associated particles, are used to increase the accuracy of readings and accent positions. The accuracy of determining readings and accent positions is 98.6% for sentences in newspaper articles. It is possible to use the linguistic processor through an interface library which has also been developed by the authors. Consequently, it has become possible not only to convert whole texts stored in text files but also to convert parts of sentences sent by the interface library sequentially, and the readings and prosodic information are optimized for the whole sentence at one time. The acoustic processor is custom-made hardware, and it has adopted new techniques, for the improvement of rules for vowel devoicing, control of phoneme durations, control of the phrase components of voice fundamental frequency and the construction of the acoustic parameter database. Due to the above-mentioned modifications, the naturalness of synthetic speech generated by a Klatt-type formant speech synthesizer was improved. On a naturalness test it was rated 3.61 on a scale of 5 points from 0 to 4.