Quality Evaluation for Document Relation Discovery Using Citation Information


IEICE TRANSACTIONS on Information and Systems   Vol.E90-D   No.8   pp.1225-1234
Publication Date: 2007/08/01
Online ISSN: 1745-1361
DOI: 10.1093/ietisy/e90-d.8.1225
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Data Mining
document relations,  frequent itemset mining,  citation matrix,  quality evaluation,  document relation evaluation,  

Full Text: PDF(460.8KB)>>
Buy this Article

Assessment of discovered patterns is an important issue in the field of knowledge discovery. This paper presents an evaluation method that utilizes citation (reference) information to assess the quality of discovered document relations. With the concept of transitivity as direct/indirect citations, a series of evaluation criteria is introduced to define the validity of discovered relations. Two kinds of validity, called soft validity and hard validity, are proposed to express the quality of the discovered relations. For the purpose of impartial comparison, the expected validity is statistically estimated based on the generative probability of each relation pattern. The proposed evaluation is investigated using more than 10,000 documents obtained from a research publication database. With frequent itemset mining as a process to discover document relations, the proposed method was shown to be a powerful way to evaluate the relations in four aspects: soft/hard scoring, direct/indirect citation, relative quality over the expected value, and comparison to human judgment.