Topic Keyword Identification for Text Summarization Using Lexical Clustering

Youngjoong KO  Kono KIM  Jungyun SEO  

IEICE TRANSACTIONS on Information and Systems   Vol.E86-D   No.9   pp.1695-1701
Publication Date: 2003/09/01
Online ISSN: 
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Issue on Text Processing for Information Access)
text summarization,  lexical clustering,  k-means algorithm,  topic keyword identification,  

Full Text: PDF>>
Buy this Article

Automatic text summarization has the goal of reducing the size of a document while preserving its content. Generally, producing a summary as extracts is achieved by including only sentences which are the most topic-related. DOCUSUM is our summarization system based on a new topic keyword identification method. The process of DOCUSUM is as follows. First, DOCUSUM converts the content words of a document into elements of a context vector space. It then constructs lexical clusters from the context vector space and identifies core clusters. Next, it selects topic keywords from the core clusters. Finally, it generates a summary of the document using the topic keywords. In the experiments on various compression ratios (the compression of 30%, the compression of 10%, and the extraction of the fixed number of sentences: 4 or 8 sentences), DOCUSUM showed better performance than other methods.