Journal
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION
Volume 16, Issue 1, Pages 201-229Publisher
SPRINGER HEIDELBERG
DOI: 10.1007/s11634-021-00478-z
Keywords
Cluster benchmarking; Internal cluster validation; External cluster validation; Mixed effects model
Categories
Ask authors/readers for more resources
This study applies nine popular clustering methods to 42 real data sets, providing a detailed characterization of the methods through multiple cluster validation indexes. It explores the similarity between the clusterings generated by the methods and the true clusterings using 30 data sets that have true clustering. Additionally, it uses mixed effects regression to relate the observable characteristics of the clusters to their similarity with the true clusterings. The study offers valuable insights into the ability of the methods to discover true clusterings and the properties of clusterings that can be expected from the methods.
Nine popular clustering methods are applied to 42 real data sets. The aim is to give a detailed characterisation of the methods by means of several cluster validation indexes that measure various individual aspects of the resulting clusters such as small within-cluster distances, separation of clusters, closeness to a Gaussian distribution etc. as introduced in Hennig (in: Data analysis and applications 1: clustering and regression, modeling-estimating, forecasting and data mining, ISTE Ltd., London, 2019). 30 of the data sets come with a true clustering. On these data sets the similarity of the clusterings from the nine methods to the true clusterings is explored. Furthermore, a mixed effects regression relates the observable individual aspects of the clusters to the similarity with the true clusterings, which in real clustering problems is unobservable. The study gives new insight not only into the ability of the methods to discover true clusterings, but also into properties of clusterings that can be expected from the methods, which is crucial for the choice of a method in a real situation without a given true clustering.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available