☆ 4.5 Article

On the performance of clustering in Hilbert spaces

IEEE TRANSACTIONS ON INFORMATION THEORY (2008)

Journal

IEEE TRANSACTIONS ON INFORMATION THEORY

Volume 54, Issue 2, Pages 781-790

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TIT.2007.913516

Keywords

clustering; empirical risk minimization; Hilbert space; k-means; random projections; vector quantization

Funding

ICREA Funding Source: Custom

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Based on n randomly drawn vectors in a separable Hilbert space, one may construct a k-means clustering scheme by minimizing an empirical squared error. We investigate the risk of such a clustering scheme, defined as the expected squared distance of a random vector X from the set of cluster centers. Our main result states that, for an almost surely bounded X, the expected excess clustering risk is O(root 1/n). Since clustering in high (or even infinite)-dimensional spaces may lead to severe computational problems, we examine the properties of a dimension reduction strategy for clustering based on Johnson-Lindenstrauss-type random projections. Our results reflect a tradeoff between accuracy and computational complexity when one uses k-means clustering after random projection of the data to a low-dimensional space. We argue that random projections work better than other simplistic dimension reduction schemes.

On the performance of clustering in Hilbert spaces

Journal

IEEE TRANSACTIONS ON INFORMATION THEORY

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

On the performance of clustering in Hilbert spaces

Journal

IEEE TRANSACTIONS ON INFORMATION THEORY

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper