Leveraging Neural Caption Translation with Visually Grounded Paraphrase Augmentation

Johanes EFFENDI  Sakriani SAKTI  Katsuhito SUDOH  Satoshi NAKAMURA  

IEICE TRANSACTIONS on Information and Systems   Vol.E103-D   No.3   pp.674-683
Publication Date: 2020/03/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2019EDP7065
Type of Manuscript: PAPER
Category: Natural Language Processing
visually grounded paraphrase,  data augmentation,  neural machine translation,  

Full Text: PDF(712.6KB)>>
Buy this Article

Since a concept can be represented by different vocabularies, styles, and levels of detail, a translation task resembles a many-to-many mapping task from a distribution of sentences in the source language into a distribution of sentences in the target language. This viewpoint, however, is not fully implemented in current neural machine translation (NMT), which is one-to-one sentence mapping. In this study, we represent the distribution itself as multiple paraphrase sentences, which will enrich the model context understanding and trigger it to produce numerous hypotheses. We use a visually grounded paraphrase (VGP), which uses images as a constraint of the concept in paraphrasing, to guarantee that the created paraphrases are within the intended distribution. In this way, our method can also be considered as incorporating image information into NMT without using the image itself. We implement this idea by crowdsourcing a paraphrasing corpus that realizes VGP and construct neural paraphrasing that behaves as expert models in a NMT. Our experimental results reveal that our proposed VGP augmentation strategies showed improvement against a vanilla NMT baseline.