Deep Learning Approaches for Pathological Voice Detection Using Heterogeneous Parameters

JiYeoun LEE  Hee-Jin CHOI  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E103-D   No.8   pp.1920-1923
Publication Date: 2020/08/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2020EDL8031
Type of Manuscript: LETTER
Category: Speech and Hearing
Keyword: 
pathological voice detection,  feedforward neural network,  convolutional neural network,  higher-order statistics,  deep learning method,  

Full Text: PDF(589.8KB)>>
Buy this Article




Summary: 
We propose a deep learning-based model for classifying pathological voices using a convolutional neural network and a feedforward neural network. The model uses combinations of heterogeneous parameters, including mel-frequency cepstral coefficients, linear predictive cepstral coefficients and higher-order statistics. We validate the accuracy of this model using the Massachusetts Eye and Ear Infirmary (MEEI) voice disorder database and the Saarbruecken Voice Database (SVD). Our model achieved an accuracy of 99.3% for MEEI and 75.18% for SVD. This model achieved an accuracy that is 7.18% higher than that of competitive models in previous studies.