Fractal Modeling of Fluctuations in the Steady Part of Sustained Vowels for High Quality Speech Synthesis

Naofumi AOKI  Tohru IFUKUBE  

Publication
IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences   Vol.E81-A   No.9   pp.1803-1810
Publication Date: 1998/09/25
Online ISSN: 
DOI: 
Print ISSN: 0916-8508
Type of Manuscript: Special Section PAPER (Special Section on Nonlinear Theory and Its Applications)
Category: Chaos, Bifurcation and Fractal
Keyword: 
naturalness of sustained vowels,  speech synthesis,  random fractal,  1/fβ fluctuation,  

Full Text: PDF>>
Buy this Article




Summary: 
The naturalness of normal sustained vowels is considered to be attributable to the fluctuations observed in the steady part where speech signal is seemingly almost periodic. There always exist two kinds of involuntary fluctuations in the steady part of sustained vowels, even if the sustained vowels are phonated as steadily as possible. One is pitch period fluctuation and the other is waveform fluctuation. In this study, frequency analyses on these fluctuations were conducted in order to investigate their general characteristics. The results of the analyses suggested that the frequency characteristics of the fluctuations were possible to be approximated as 1/fβ-like, which is regarded as the specific feature of random fractal. Therefore, a procedure based on random fractal generation methods was proposed in order to produce these fluctuations for the improvement of the voice quality of synthesized sustained vowels. A series of psychoacoustic experiments was also conducted to evaluate the proposed technique. Experimental results indicated that the proposed technique was effective for synthesized sustained vowels to be perceived as human-like. Unlike the sustained vowels which were synthesized without pitch period fluctuation nor waveform fluctuation, the synthesized sustained vowels which contained the fluctuations were not perceived as buzzer-like, which is the major problem of the voice quality of synthesized sustained vowels. However, it was also found that both of the fluctuations were not always the acoustic cues for the naturalness of normal sustained vowels. The synthesized sustained vowels which contained the fluctuations whose frequency characteristics were the same as that of white noise were perceived as noise-like, which is not at all the voice quality of normal sustained vowels. The results of psychoacoustic experiments indicated that the frequency characteristics of the fluctuations, which are possible to be modeled as 1/fβ-like, were the significant factors for the naturalness of normal sustained vowels.