4.2 Article

An empirical comparison and characterisation of nine popular clustering methods

Journal

ADVANCES IN DATA ANALYSIS AND CLASSIFICATION
Volume 16, Issue 1, Pages 201-229

Publisher

SPRINGER HEIDELBERG
DOI: 10.1007/s11634-021-00478-z

Keywords

Cluster benchmarking; Internal cluster validation; External cluster validation; Mixed effects model

Ask authors/readers for more resources

This study applies nine popular clustering methods to 42 real data sets, providing a detailed characterization of the methods through multiple cluster validation indexes. It explores the similarity between the clusterings generated by the methods and the true clusterings using 30 data sets that have true clustering. Additionally, it uses mixed effects regression to relate the observable characteristics of the clusters to their similarity with the true clusterings. The study offers valuable insights into the ability of the methods to discover true clusterings and the properties of clusterings that can be expected from the methods.
Nine popular clustering methods are applied to 42 real data sets. The aim is to give a detailed characterisation of the methods by means of several cluster validation indexes that measure various individual aspects of the resulting clusters such as small within-cluster distances, separation of clusters, closeness to a Gaussian distribution etc. as introduced in Hennig (in: Data analysis and applications 1: clustering and regression, modeling-estimating, forecasting and data mining, ISTE Ltd., London, 2019). 30 of the data sets come with a true clustering. On these data sets the similarity of the clusterings from the nine methods to the true clusterings is explored. Furthermore, a mixed effects regression relates the observable individual aspects of the clusters to the similarity with the true clusterings, which in real clustering problems is unobservable. The study gives new insight not only into the ability of the methods to discover true clusterings, but also into properties of clusterings that can be expected from the methods, which is crucial for the choice of a method in a real situation without a given true clustering.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.2
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available