A Non-Intrusive Speech Intelligibility Estimation Method Based on Deep Learning Using Autoencoder Features

Yoonhee KIM  Deokgyu YUN  Hannah LEE  Seung Ho CHOI  

IEICE TRANSACTIONS on Information and Systems   Vol.E103-D   No.3   pp.714-715
Publication Date: 2020/03/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2019EDL8150
Type of Manuscript: LETTER
Category: Speech and Hearing
autoencoder,  bottleneck feature,  STOI,  deep learning,  long short-term memory (LSTM),  

Full Text: PDF>>
Buy this Article

This paper presents a deep learning-based non-intrusive speech intelligibility estimation method using bottleneck features of autoencoder. The conventional standard non-intrusive speech intelligibility estimation method, P.563, lacks intelligibility estimation performance in various noise environments. We propose a more accurate speech intelligibility estimation method based on long-short term memory (LSTM) neural network whose input and output are an autoencoder bottleneck features and a short-time objective intelligence (STOI) score, respectively, where STOI is a standard tool for measuring intrusive speech intelligibility with reference speech signals. We showed that the proposed method has a superior performance by comparing with the conventional standard P.563 and mel-frequency cepstral coefficient (MFCC) feature-based intelligibility estimation methods for speech signals in various noise environments.