Probabilistic Frequent Itemset Mining on a GPU Cluster

Yusuke KOZAWA  Toshiyuki AMAGASA  Hiroyuki KITAGAWA  

IEICE TRANSACTIONS on Information and Systems   Vol.E97-D   No.4   pp.779-789
Publication Date: 2014/04/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.E97.D.779
Type of Manuscript: Special Section PAPER (Special Section on Data Engineering and Information Management)
GPU,  uncertain databases,  probabilistic frequent itemsets,  

Full Text: FreePDF(1.4MB)

Probabilistic frequent itemset mining, which discovers frequent itemsets from uncertain data, has attracted much attention due to inherent uncertainty in the real world. Many algorithms have been proposed to tackle this problem, but their performance is not satisfactory because handling uncertainty incurs high processing cost. To accelerate such computation, we utilize GPUs (Graphics Processing Units). Our previous work accelerated an existing algorithm with a single GPU. In this paper, we extend the work to employ multiple GPUs. Proposed methods minimize the amount of data that need to be communicated among GPUs, and achieve load balancing as well. Based on the methods, we also present algorithms on a GPU cluster. Experiments show that the single-node methods realize near-linear speedups, and the methods on a GPU cluster of eight nodes achieve up to a 7.1 times speedup.