☆ 4.5 Article

SimClus: an effective algorithm for clustering with a lower bound on similarity

KNOWLEDGE AND INFORMATION SYSTEMS (2011)

Journal

KNOWLEDGE AND INFORMATION SYSTEMS

Volume 28, Issue 3, Pages 665-685

Publisher

SPRINGER LONDON LTD

DOI: 10.1007/s10115-010-0360-6

Keywords

Dominating set; Overlapping clustering; Set cover; Star clustering

Funding

NIH, National Center for Research Resources [P20 RR016471]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Clustering algorithms generally accept a parameter k from the user, which determines the number of clusters sought. However, in many application domains, like document categorization, social network clustering, and frequent pattern summarization, the proper value of k is difficult to guess. An alternative clustering formulation that does not require k is to impose a lower bound on the similarity between an object and its corresponding cluster representative. Such a formulation chooses exactly one representative for every cluster and minimizes the representative count. It has many additional benefits. For instance, it supports overlapping clusters in a natural way. Moreover, for every cluster, it selects a representative object, which can be effectively used in summarization or semi-supervised classification task. In this work, we propose an algorithm, SimClus, for clustering with lower bound on similarity. It achieves a O(log n) approximation bound on the number of clusters, whereas for the best previous algorithm the bound can be as poor as O(n). Experiments on real and synthetic data sets show that our algorithm produces more than 40% fewer representative objects, yet offers the same or better clustering quality. We also propose a dynamic variant of the algorithm, which can be effectively used in an on-line setting.

SimClus: an effective algorithm for clustering with a lower bound on similarity

Journal

KNOWLEDGE AND INFORMATION SYSTEMS

Publisher

SPRINGER LONDON LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

SimClus: an effective algorithm for clustering with a lower bound on similarity

Journal

KNOWLEDGE AND INFORMATION SYSTEMS

Publisher

SPRINGER LONDON LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper