Spectral Methods for Thesaurus Construction

Nobuyuki SHIMIZU  Masashi SUGIYAMA  Hiroshi NAKAGAWA  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E93-D   No.6   pp.1378-1385
Publication Date: 2010/06/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.E93.D.1378
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Section on Info-Plosion)
Category: Natural Language Processing
Keyword: 
synonym acquisition,  synonym extraction,  thesaurus,  spectral clustering,  graph laplacian,  

Full Text: PDF>>
Buy this Article




Summary: 
Traditionally, popular synonym acquisition methods are based on the distributional hypothesis, and a metric such as Jaccard coefficients is used to evaluate the similarity between the contexts of words to obtain synonyms for a query. On the other hand, when one tries to compile and clean a thesaurus, one often already has a modest number of synonym relations at hand. Could something be done with a half-built thesaurus alone? We propose the use of spectral methods and discuss their relation to other network-based algorithms in natural language processing (NLP), such as PageRank and Bootstrapping. Since compiling a thesaurus is very laborious, we believe that adding the proposed method to the toolkit of thesaurus constructors would significantly ease the pain in accomplishing this task.