4.5 Article

On the performance of clustering in Hilbert spaces

Journal

IEEE TRANSACTIONS ON INFORMATION THEORY
Volume 54, Issue 2, Pages 781-790

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TIT.2007.913516

Keywords

clustering; empirical risk minimization; Hilbert space; k-means; random projections; vector quantization

Funding

  1. ICREA Funding Source: Custom

Ask authors/readers for more resources

Based on n randomly drawn vectors in a separable Hilbert space, one may construct a k-means clustering scheme by minimizing an empirical squared error. We investigate the risk of such a clustering scheme, defined as the expected squared distance of a random vector X from the set of cluster centers. Our main result states that, for an almost surely bounded X, the expected excess clustering risk is O(root 1/n). Since clustering in high (or even infinite)-dimensional spaces may lead to severe computational problems, we examine the properties of a dimension reduction strategy for clustering based on Johnson-Lindenstrauss-type random projections. Our results reflect a tradeoff between accuracy and computational complexity when one uses k-means clustering after random projection of the data to a low-dimensional space. We argue that random projections work better than other simplistic dimension reduction schemes.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available