Effects of Automated Transcripts on Non-Native Speakers' Listening Comprehension

Xun CAO  Naomi YAMASHITA  Toru ISHIDA  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E101-D   No.3   pp.730-739
Publication Date: 2018/03/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2017EDP7255
Type of Manuscript: PAPER
Category: Human-computer Interaction
Keyword: 
listening comprehension problems,  automatic speech recognition (ASR) transcripts,  non-native speakers (NNSs),  eye gaze,  

Full Text: PDF(2.2MB)
>>Buy this Article


Summary: 
Previous research has shown that transcripts generated by automatic speech recognition (ASR) technologies can improve the listening comprehension of non-native speakers (NNSs). However, we still lack a detailed understanding of how ASR transcripts affect listening comprehension of NNSs. To explore this issue, we conducted two studies. The first study examined how the current presentation of ASR transcripts impacted NNSs' listening comprehension. 20 NNSs engaged in two listening tasks, each in different conditions: C1) audio only and C2) audio+ASR transcripts. The participants pressed a button whenever they encountered a comprehension problem, and explained each problem in the subsequent interviews. From our data analysis, we found that NNSs adopted different strategies when using the ASR transcripts; some followed the transcripts throughout the listening; some only checked them when necessary. NNSs also appeared to face difficulties following imperfect and slightly delayed transcripts while listening to speech - many reported difficulties concentrating on listening/reading or shifting between the two. The second study explored how different display methods of ASR transcripts affected NNSs' listening experiences. We focused on two display methods: 1) accuracy-oriented display which shows transcripts only after the completion of speech input analysis, and 2) speed-oriented display which shows the interim analysis results of speech input. We conducted a laboratory experiment with 22 NNSs who engaged in two listening tasks with ASR transcripts presented via the two display methods. We found that the more the NNSs paid attention to listening to the audio, the more they tended to prefer the speed-oriented transcripts, and vice versa. Mismatched transcripts were found to have negative effects on NNSs' listening comprehension. Our findings have implications for improving the presentation methods of ASR transcripts to more effectively support NNSs.