
For FullText PDF, please login, if you are a member of IEICE,
or go to Pay Per View on menu list, if you are a nonmember of IEICE.

VarianceBased kClustering Algorithms by Voronoi Diagrams and Randomization
Mary INABA Naoki KATOH Hiroshi IMAI
Publication
IEICE TRANSACTIONS on Information and Systems
Vol.E83D
No.6
pp.11991206 Publication Date: 2000/06/25 Online ISSN:
DOI: Print ISSN: 09168532 Type of Manuscript: PAPER Category: Algorithms Keyword: geometric clustering, Voronoi diagram, randomization,
Full Text: PDF(391.7KB)>>
Summary:
In this paper we consider the kclustering problem for a set S of n points p_{i}=(x_{i}) in the ddimensional space with variancebased errors as clustering criteria, motivated from the color quantization problem of computing a color lookup table for frame buffer display. As the intercluster criterion to minimize, the sum of intracluster errors over every cluster is used, and as the intracluster criterion of a cluster S_{j}, S_{j}^{α1} Σ_{pi Sj} x_{i}  ^{} x (S_{j})^{2} is considered, where  is the L_{2} norm and ^{} x (S_{j}) is the centroid of points in S_{j}, i. e. , (1/S_{j})Σ_{pi Sj} x_{i}. The cases of α=1,2 correspond to the sum of squared errors and the allpairs sum of squared errors, respectively. The kclustering problem under the criterion with α = 1,2 are treated in a unified manner by characterizing the optimum solution to the kclustering problem by the ordinary Euclidean Voronoi diagram and the weighted Voronoi diagram with both multiplicative and additive weights. With this framework, the problem is related to the generalized primary shatter function for the Voronoi diagrams. The primary shatter function is shown to be O(n^{O(kd)}), which implies that, for fixed k, this clustering problem can be solved in a polynomial time. For the problem with the most typical intracluster criterion of the sum of squared errors, we also present an efficient randomized algorithm which, roughly speaking, finds an εapproximate 2clustering in O(n(1/ε)^{d}) time, which is quite practical and may be used to real largescale problems such as the color quantization problem.

