|
For Full-Text PDF, please login, if you are a member of IEICE,
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
|
Detecting Depression from Speech through an Attentive LSTM Network
Yan ZHAO Yue XIE Ruiyu LIANG Li ZHANG Li ZHAO Chengyu LIU
Publication
IEICE TRANSACTIONS on Information and Systems
Vol.E104-D
No.11
pp.2019-2023 Publication Date: 2021/11/01 Publicized: 2021/08/24 Online ISSN: 1745-1361
DOI: 10.1587/transinf.2020EDL8132 Type of Manuscript: LETTER Category: Speech and Hearing Keyword: depression detection, LSTM, attention mechanism, vocal expression,
Full Text: PDF(252.2KB)>>
Summary:
Depression endangers people's health conditions and affects the social order as a mental disorder. As an efficient diagnosis of depression, automatic depression detection has attracted lots of researcher's interest. This study presents an attention-based Long Short-Term Memory (LSTM) model for depression detection to make full use of the difference between depression and non-depression between timeframes. The proposed model uses frame-level features, which capture the temporal information of depressive speech, to replace traditional statistical features as an input of the LSTM layers. To achieve more multi-dimensional deep feature representations, the LSTM output is then passed on attention layers on both time and feature dimensions. Then, we concat the output of the attention layers and put the fused feature representation into the fully connected layer. At last, the fully connected layer's output is passed on to softmax layer. Experiments conducted on the DAIC-WOZ database demonstrate that the proposed attentive LSTM model achieves an average accuracy rate of 90.2% and outperforms the traditional LSTM network and LSTM with local attention by 0.7% and 2.3%, respectively, which indicates its feasibility.
|
open access publishing via
|
 |
 |
 |
 |
 |
|
|