An Improved Speech / Nonspeech Classification Based on Feature Combination for Audio Indexing

Ji-Soo KEUM  Hyon-Soo LEE  Masafumi HAGIWARA  

IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences   Vol.E93-A   No.4   pp.830-832
Publication Date: 2010/04/01
Online ISSN: 1745-1337
DOI: 10.1587/transfun.E93.A.830
Print ISSN: 0916-8508
Type of Manuscript: LETTER
Category: Speech and Hearing
speech/nonspeech classification,  spectral duration analysis,  feature combination,  audio indexing,  

Full Text: PDF(104.1KB)>>
Buy this Article

In this letter, we propose an improved speech/ nonspeech classification method to effectively classify a multimedia source. To improve performance, we introduce a feature based on spectral duration analysis, and combine recently proposed features such as high zero crossing rate ratio (HZCRR), low short time energy ratio (LSTER), and pitch ratio (PR). According to the results of our experiments on speech, music, and environmental sounds, the proposed method obtained high classification results when compared with conventional approaches.