DOA Estimation of Multiple Speech Sources from a Stereophonic Mixture in Underdetermined Case
Ning DING Nozomu HAMADA
Publication
IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences
Vol.E95A
No.4
pp.735744 Publication Date: 2012/04/01 Online ISSN: 17451337
DOI: 10.1587/transfun.E95.A.735 Print ISSN: 09168508 Type of Manuscript: PAPER Category: Engineering Acoustics Keyword: direction of arrival, timefrequency representation, reliability index, statistical model, kernel density estimator,
Summary:
This paper proposes a directionofarrival (DOA) estimation method of multiple speech sources from a stereophonic mixture in an underdetermined case where the number of sources exceeds the number of sensors. The method relies on the sparseness of speech signals in timefrequency (TF) domain representation which means multiple independent speakers have a small overlap. At first, a selection of TF cells bearing reliable spatial information is proposed by an introduced reliability index which is defined by the estimated interaural phase difference at each TF cell. Then, a statistical error propagation model between the phase difference at TF cell and its consequent DOA is introduced. By employing this model and the sparseness in TF domain the DOA estimation problem is altered to obtaining local peaks of probability density function of DOA. Finally the kernel density estimator approach based on the proposed statistical model is applied. The performance of the proposed method is assessed by conducted experiments. Our method outperforms others both in accuracy for real observed data and in robustness for simulation with additional diffused noise.

