Extraction of Semantic Text Portion Related to Anchor Link

Bui Quang HUNG  Masanori OTSUBO  Yoshinori HIJIKATA  Shogo NISHIDA  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E89-D   No.6   pp.1834-1847
Publication Date: 2006/06/01
Online ISSN: 1745-1361
DOI: 10.1093/ietisy/e89-d.6.1834
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Section on Human Communication II)
Category: Language
Keyword: 
text mining,  web mining,  semantic text portion,  link structure,  anchor,  user experiment,  

Full Text: PDF>>
Buy this Article




Summary: 
Recently, semantic text portion (STP) is getting popular in the field of Web mining. STP is a text portion in the original page which is semantically related to the anchor pointing to the target page. STPs may include the facts and the people's opinions about the target pages. STPs can be used for various upper-level applications such as automatic summarization and document categorization. In this paper, we concentrate on extracting STPs. We conduct a survey of STP to see the positions of STPs in original pages and find out HTML tags which can divide STPs from the other text portions in original pages. We then develop a method for extracting STPs based on the result of the survey. The experimental results show that our method achieves high performance.