Entity Network Prediction Using Multitype Topic Models

Hitohiro SHIOZAKI  Koji EGUCHI  Takenao OHKAWA  

IEICE TRANSACTIONS on Information and Systems   Vol.E91-D   No.11   pp.2589-2598
Publication Date: 2008/11/01
Online ISSN: 1745-1361
DOI: 10.1093/ietisy/e91-d.11.2589
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Section on Knowledge, Information and Creativity Support System)
Category: Knowledge Discovery and Data Mining
statistical topic models,  multitype topic models,  link prediction,  entity networks,  

Full Text: PDF(518.1KB)>>
Buy this Article

 | Errata[Uploaded on December 1,2008]

Conveying information about who, what, when and where is a primary purpose of some genres of documents, typically news articles. Statistical models that capture dependencies between named entities and topics can play an important role in handling such information. Although some relationships between who and where should be mentioned in such a document, no statistical topic models explicitly address the textual interactions between a who-entity and a where-entity. This paper presents a statistical model that directly captures the dependencies between an arbitrary number of word types, such as who-entities, where-entities and topics, mentioned in each document. We show that this multitype topic model performs better at making predictions on entity networks, in which each vertex represents an entity and each edge weight represents how a pair of entities at the incident vertices is closely related, through our experiments on predictions of who-entities and links between them. We also demonstrate the scale-free property in the weighted networks of entities extracted from written mentions.