
For FullText PDF, please login, if you are a member of IEICE,
or go to Pay Per View on menu list, if you are a nonmember of IEICE.

Computational Complexity of Finding Highly Cooccurrent Itemsets in Market Basket Databases
YeonDae KWON Yasunori ISHIHARA Shougo SHIMIZU Minoru ITO
Publication
IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences
Vol.E83A
No.12
pp.27232735 Publication Date: 2000/12/25 Online ISSN:
DOI: Print ISSN: 09168508 Type of Manuscript: PAPER Category: General Fundamentals and Boundaries Keyword: data mining, large itemset, highly cooccurrent itemset, computational complexity,
Full Text: PDF(300.4KB)>>
Summary:
Data mining is to analyze all the data in a huge database and to obtain useful information for database users. One of the wellstudied problems in data mining is the search for meaningful association rules in a market basket database which contains massive amounts of sales transactions. The problem of mining meaningful association rules is to find all the large itemsets first, and then to construct meaningful association rules from the large itemsets. In our previous work, we have shown that it is NPcomplete to decide whether there exists a large itemset with a given size. Also, we have proposed a subclass of databases, called ksparse databases, for which we can efficiently find all the large itemsets. Intuitively, ksparsity of a database means that the supports of itemsets of size k or more are sufficiently low in the database. In this paper, we introduce the notion of (k,c)sparsity, which is strictly weaker than the ksparsity in our previous work. The value of c represents a degree of sparsity. Using (k,c)sparsity, we propose a larger subclass of databases for which we can still efficiently find all the large itemsets. Next, we propose alternative measures to the support. For each measure, an itemset is called highly cooccurrent if the value indicating the correlation among the items exceeds a given threshold. In this paper, we define the highly cooccurrent itemset problem formally as deciding whether there exists a highly cooccurrent itemset with a given size, and show that the problem is NPcomplete under whichever measure. Furthermore, based on the notion of (k,c)sparsity, we propose subclasses of databases for which we can efficiently find all the highly cooccurrent itemsets.

