Simple Weighting Techniques for Query Expansion in Biomedical Document Retrieval

Young-In SONG  Kyoung-Soo HAN  So-Young PARK  Sang-Bum KIM  Hae-Chang RIM  

IEICE TRANSACTIONS on Information and Systems   Vol.E90-D   No.11   pp.1873-1876
Publication Date: 2007/11/01
Online ISSN: 1745-1361
DOI: 10.1093/ietisy/e90-d.11.1873
Print ISSN: 0916-8532
Type of Manuscript: LETTER
Category: Contents Technology and Web Information Systems
query expansion,  biomedical terminology,  biomedical document retrieval,  biomedical terminology weighting,  

Full Text: PDF>>
Buy this Article

In this paper, we propose two weighting techniques to improve performances of query expansion in biomedical document retrieval, especially when a short biomedical term in a query is expanded with its synonymous multi-word terms. When a query contains synonymous terms of different lengths, a traditional IR model highly ranks a document containing a longer terminology because a longer terminology has more chance to be matched with a query. However, such preference is clearly inappropriate and it often yields an unsatisfactory result. To alleviate the bias weighting problem, we devise a method of normalizing the weights of query terms in a long multi-word biomedical term, and a method of discriminating terms by using inverse terminology frequency which is a novel statistics estimated in a query domain. The experiment results on MEDLINE corpus show that our two simple techniques improve the retrieval performance by adjusting the inadequate preference for long multi-word terminologies in an expanded query.