4.7 Article

Object-based cluster validation with densities

Journal

PATTERN RECOGNITION
Volume 121, Issue -, Pages -

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2021.108223

Keywords

Clustering; Clustering validity index; Internal index; Density-based cluster validation; Unsupervised

Ask authors/readers for more resources

Clustering validity indices are used to determine the correct number of clusters and evaluate the quality of clusters formed by clustering algorithms. Internal validity indices, such as OCVD, focus on capturing the separation and compactness of clusters by considering the density of data objects. OCVD, a single number that averages the density-based contribution of individual data objects, performs well in detecting the correct number of clusters, particularly in data sets with clusters of arbitrary shapes.
Clustering validity indices are typically used as tools to find the correct number of clusters in a data set and/or to evaluate the quality of the clusters formed by clustering algorithms. Clustering validity in-dices measure separation and compactness of clusters. Typically, when applying a clustering algorithm, the input includes the number of clusters. After applying the algorithm with several different numbers of clusters, we determine the number of clusters to be the one with the best validity index. There are two types of clustering validity indices: external indices that are supervised, and internal indices that are un-supervised. The focus of this paper is on internal validity indices. Some existing internal validity indices capture the properties of the clusters by using representative statistics such as mean, variance, diameter, etc., however, these do not perform well when clusters have arbitrary shapes. One approach to overcome this issue is to use the density of the data objects in each cluster. That provides the advantage of captur-ing the full characteristics of the cluster which is most beneficial when there are clusters with arbitrary shapes. In the literature, a few density-based clustering validity indices have been proposed. However, some of them show poor performance when the clusters are not perfectly separated. Some others per-form poorly because they use only representative objects from each cluster instead of all objects. The contribution of this paper is an internal validity index named the object-based clustering validity index with densities (OCVD). OCVD is a single number that averages the density-based contribution of individ-ual data objects to both separation and compactness of clusters. The methodology behind calculating the density-based contributions of the objects is kernel density estimation. We show through several exper-iments that OCVD performs well in detecting the correct number of clusters in data sets with different cluster shapes including arbitrary shapes. (c) 2021 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available