Auditory Artifacts due to Switching Head-Related Transfer Functions of a Dynamic Virtual Auditory Display

Makoto OTANI  Tatsuya HIRAHARA  

Publication
IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences   Vol.E91-A   No.6   pp.1320-1328
Publication Date: 2008/06/01
Online ISSN: 1745-1337
Print ISSN: 0916-8508
Type of Manuscript: Special Section PAPER (Special Section on Acoustic Scene Analysis and Reproduction)
Category: 
Keyword: 
virtual auditory display,  head motion,  head-related transfer functions,  wave discontinuity,  

Full Text: PDF(584.1KB)
>>Buy this Article


Summary: 
Auditory artifacts due to switching head-related transfer functions (HRTFs) are investigated, using a software-implemented dynamic virtual auditory display (DVAD) developed by the authors. The DVAD responds to a listener's head rotation using a head-tracking device and switching HRTFs to present a highly realistic 3D virtual auditory space to the listener. The DVAD operates on Windows XP and does not require high-performance computers. A total system latency (TSL), which is the delay between head motion and the corresponding change of the ear input signal, is a significant factor of DVADs. The measured TSL of our DVAD is about 50 ms, which is sufficient for practical applications and localization experiments. Another matter of concern is the auditory artifact in DVADs caused by switching HRTFs. Switching HRTFs gives rise to wave discontinuity of synthesized binaural signals, which can be perceived as click noises that degrade the quality of presented sound image. A subjective test and excitation patterns (EPNs) analysis using an auditory filter are performed with various source signals and HRTF spatial resolutions. The results of the subjective test reveal that click noise perception depends on the source signal and the HRTF spatial resolution. Furthermore, EPN analysis reveals that switching HRTFs significantly distorts the EPNs at the off signal frequencies. Such distortions, however, are masked perceptually by broad-bandwidth source signals, whereas they are not masked by narrow-bandwidth source signals, thereby making the click noise more detectable. A higher HRTF spatial resolution leads to smaller distortions. But, depending on the source signal, perceivable click noises still remain even with 0.5-degree spatial resolution, which is less than minimum audible angle (1 degree in front).