Mining Communities on the Web Using a Max-Flow and a Site-Oriented Framework

Yasuhito ASANO  Takao NISHIZEKI  Masashi TOYODA  Masaru KITSUREGAWA  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E89-D   No.10   pp.2606-2615
Publication Date: 2006/10/01
Online ISSN: 1745-1361
DOI: 10.1093/ietisy/e89-d.10.2606
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Data Mining
Keyword: 
Web,  data mining,  site,  max-flow,  site-oriented framework,  

Full Text: PDF>>
Buy this Article




Summary: 
There are several methods for mining communities on the Web using hyperlinks. One of the well-known ones is a max-flow based method proposed by Flake et al. The method adopts a page-oriented framework, that is, it uses a page on the Web as a unit of information, like other methods including HITS and trawling. Recently, Asano et al. built a site-oriented framework which uses a site as a unit of information, and they experimentally showed that trawling on the site-oriented framework often outputs significantly better communities than trawling on the page-oriented framework. However, it has not been known whether the site-oriented framework is effective in mining communities through the max-flow based method. In this paper, we first point out several problems of the max-flow based method, mainly owing to the page-oriented framework, and then propose solutions to the problems by utilizing several advantages of the site-oriented framework. Computational experiments reveal that our max-flow based method on the site-oriented framework is very effective in mining communities, related to the topics of given pages, in comparison with the original max-flow based method on the page-oriented framework.