Variance-Based k-Clustering Algorithms by Voronoi Diagrams and Randomization

Mary INABA  Naoki KATOH  Hiroshi IMAI  

IEICE TRANSACTIONS on Information and Systems   Vol.E83-D   No.6   pp.1199-1206
Publication Date: 2000/06/25
Online ISSN: 
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Algorithms
geometric clustering,  Voronoi diagram,  randomization,  

Full Text: PDF(391.7KB)>>
Buy this Article

In this paper we consider the k-clustering problem for a set S of n points pi=(xi) in the d-dimensional space with variance-based errors as clustering criteria, motivated from the color quantization problem of computing a color lookup table for frame buffer display. As the inter-cluster criterion to minimize, the sum of intra-cluster errors over every cluster is used, and as the intra-cluster criterion of a cluster Sj, |Sj|α-1 Σpi Sj ||xi - - x (Sj)||2 is considered, where |||| is the L2 norm and - x (Sj) is the centroid of points in Sj, i. e. , (1/|Sj|)Σpi Sj xi. The cases of α=1,2 correspond to the sum of squared errors and the all-pairs sum of squared errors, respectively. The k-clustering problem under the criterion with α = 1,2 are treated in a unified manner by characterizing the optimum solution to the k-clustering problem by the ordinary Euclidean Voronoi diagram and the weighted Voronoi diagram with both multiplicative and additive weights. With this framework, the problem is related to the generalized primary shatter function for the Voronoi diagrams. The primary shatter function is shown to be O(nO(kd)), which implies that, for fixed k, this clustering problem can be solved in a polynomial time. For the problem with the most typical intra-cluster criterion of the sum of squared errors, we also present an efficient randomized algorithm which, roughly speaking, finds an ε-approximate 2-clustering in O(n(1/ε)d) time, which is quite practical and may be used to real large-scale problems such as the color quantization problem.