A Rate-Distortion Theoretic View of Dirichlet Process Means Clustering

Masahiro KOBAYASHI  Kazuho WATANABE  

Publication
A - Abstracts of IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences (Japanese Edition)   Vol.J100-A   No.12   pp.475-486
Publication Date: 2017/12/01
Online ISSN: 1881-0195
DOI: 
Type of Manuscript: PAPER
Category: 
Keyword: 
clustering,  Dirichlet process,  rate-distortion curve,  lossy compression,  

Full Text(in Japanese): PDF(1017.4KB)>>
Buy this Article




Summary: 
DP-means clustering was devised as an extension of K-means clustering. It automatically estimates the number of clusters from data by specifying a penalty parameter. However, it is unknown how the estimated number of clusters changes against the penalty parameter and how to determine its proper value. This study considers the relationship between DP-means and the rate-distortion curve and demonstrates that the profile of the number of clusters approaches the rate-distortion curve in the high-dimensional limit. Through numerical experiments, we verify that the penalty parameter behaves like the maximum distortion in training data.