☆ 4.2 Article

An empirical comparison and characterisation of nine popular clustering methods

ADVANCES IN DATA ANALYSIS AND CLASSIFICATION (2022)

Journal

ADVANCES IN DATA ANALYSIS AND CLASSIFICATION

Volume 16, Issue 1, Pages 201-229

Publisher

SPRINGER HEIDELBERG

DOI: 10.1007/s11634-021-00478-z

Keywords

Cluster benchmarking; Internal cluster validation; External cluster validation; Mixed effects model

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This study applies nine popular clustering methods to 42 real data sets, providing a detailed characterization of the methods through multiple cluster validation indexes. It explores the similarity between the clusterings generated by the methods and the true clusterings using 30 data sets that have true clustering. Additionally, it uses mixed effects regression to relate the observable characteristics of the clusters to their similarity with the true clusterings. The study offers valuable insights into the ability of the methods to discover true clusterings and the properties of clusterings that can be expected from the methods.

Nine popular clustering methods are applied to 42 real data sets. The aim is to give a detailed characterisation of the methods by means of several cluster validation indexes that measure various individual aspects of the resulting clusters such as small within-cluster distances, separation of clusters, closeness to a Gaussian distribution etc. as introduced in Hennig (in: Data analysis and applications 1: clustering and regression, modeling-estimating, forecasting and data mining, ISTE Ltd., London, 2019). 30 of the data sets come with a true clustering. On these data sets the similarity of the clusterings from the nine methods to the true clusterings is explored. Furthermore, a mixed effects regression relates the observable individual aspects of the clusters to the similarity with the true clusterings, which in real clustering problems is unobservable. The study gives new insight not only into the ability of the methods to discover true clusterings, but also into properties of clusterings that can be expected from the methods, which is crucial for the choice of a method in a real situation without a given true clustering.

An empirical comparison and characterisation of nine popular clustering methods

Journal

ADVANCES IN DATA ANALYSIS AND CLASSIFICATION

Publisher

SPRINGER HEIDELBERG

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

An empirical comparison and characterisation of nine popular clustering methods

Journal

ADVANCES IN DATA ANALYSIS AND CLASSIFICATION

Publisher

SPRINGER HEIDELBERG

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper