TL-Rank: A Blend of Text and Link Information for Measuring Similarity in Scientific Literature Databases

Seok-Ho YOON  Ji-Su KIM  Sang-Wook KIM  Choonhwa LEE  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E95-D   No.10   pp.2556-2559
Publication Date: 2012/10/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.E95.D.2556
Print ISSN: 0916-8532
Type of Manuscript: LETTER
Category: Artificial Intelligence, Data Mining
Keyword: 
similarity score,  text-based measure,  link-based measure,  keyword set expansion,  

Full Text: PDF(772.2KB)>>
Buy this Article

 | Errata[Uploaded on January 1,2012]


Summary: 
This paper presents a novel similarity measure that computes similarity scores among scientific research papers. The text of a given paper in online scientific literature is often found to be incomplete in terms of its potential to be compared with others, which likely leads to inaccurate results. Our solution to this problem makes use of both text and link information of a paper in question for similarity scores in that the comparison text of the paper is strengthened by adding that of papers related to it. More accurate similarity scores can be computed by reinforcing the input with the citations of the paper as well as the citations included within the paper. The efficacy of the proposed measure is validated through our extensive performance evaluation study which demonstrates a substantial gain.