Modeling Storylines in Lyrics

Kento WATANABE  Yuichiroh MATSUBAYASHI  Kentaro INUI  Satoru FUKAYAMA  Tomoyasu NAKANO  Masataka GOTO  

IEICE TRANSACTIONS on Information and Systems   Vol.E101-D   No.4   pp.1167-1179
Publication Date: 2018/04/01
Publicized: 2017/12/22
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2017EDP7188
Type of Manuscript: PAPER
Category: Natural Language Processing
bayesian model,  generative model,  lyrics structure,  lyrics understanding,  natural language processing,  

Full Text: PDF>>
Buy this Article

This paper addresses the issue of modeling the discourse nature of lyrics and presented the first study aiming at capturing the two common discourse-related notions: storylines and themes. We assume that a storyline is a chain of transitions over topics of segments and a song has at least one entire theme. We then hypothesize that transitions over topics of lyric segments can be captured by a probabilistic topic model which incorporates a distribution over transitions of latent topics and that such a distribution of topic transitions is affected by the theme of lyrics. Aiming to test those hypotheses, this study conducts experiments on the word prediction and segment order prediction tasks exploiting a large-scale corpus of popular music lyrics for both English and Japanese (around 100 thousand songs). The findings we gained from these experiments can be summarized into two respects. First, the models with topic transitions significantly outperformed the model without topic transitions in word prediction. This result indicates that typical storylines included in our lyrics datasets were effectively captured as a probabilistic distribution of transitions over latent topics of segments. Second, the model incorporating a latent theme variable on top of topic transitions outperformed the models without such variables in both word prediction and segment order prediction. From this result, we can conclude that considering the notion of theme does contribute to the modeling of storylines of lyrics.