Label-Adversarial Jointly Trained Acoustic Word Embedding

Zhaoqi LI
Qingwei ZHAO
Pengyuan ZHANG

IEICE TRANSACTIONS on Information and Systems   Vol.E105-D    No.8    pp.1501-1505
Publication Date: 2022/08/01
Publicized: 2022/05/20
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2022EDL8012
Type of Manuscript: LETTER
Category: Speech and Hearing
query-by-example,  spoken term detection,  acoustic word embeddings,  gradient reversal layer,  

Full Text: PDF(505.3KB)>>
Buy this Article

Query-by-example spoken term detection (QbE-STD) is a task of using speech queries to match utterances, and the acoustic word embedding (AWE) method of generating fixed-length representations for speech segments has shown high performance and efficiency in recent work. We propose an AWE training method using a label-adversarial network to reduce the interference information learned during AWE training. Experiments demonstrate that our method achieves significant improvements on multilingual and zero-resource test sets.

open access publishing via