4.2 Article

Understanding the Adjusted Rand Index and Other Partition Comparison Indices Based on Counting Object Pairs

Journal

JOURNAL OF CLASSIFICATION
Volume 39, Issue 3, Pages 487-509

Publisher

SPRINGER
DOI: 10.1007/s00357-022-09413-z

Keywords

Clustering comparison; External validity indices; Reference standard partition; Trial partition; Wallace indices; Cluster size imbalance

Ask authors/readers for more resources

This study investigates the adjusted Rand index and other partition comparison indices based on counting object pairs, revealing that overall indices can be decomposed into individual cluster level agreement measures. It also shows how overall indices are influenced by cluster size imbalances, primarily reflecting agreement on large clusters and providing less information on smaller clusters.
In unsupervised machine learning, agreement between partitions is commonly assessed with so-called external validity indices. Researchers tend to use and report indices that quantify agreement between two partitions for all clusters simultaneously. Commonly used examples are the Rand index and the adjusted Rand index. Since these overall measures give a general notion of what is going on, their values are usually hard to interpret. The goal of this study is to provide a thorough understanding of the adjusted Rand index as well as many other partition comparison indices based on counting object pairs. It is shown that many overall indices based on the pair-counting approach can be decomposed into indices that reflect the degree of agreement on the level of individual clusters. The decompositions (1) show that the overall indices can be interpreted as summary statistics of the agreement on the cluster level, (2) specify how these overall indices are related to the indices for individual clusters, and (3) show that the overall indices are affected by cluster size imbalance: if cluster sizes are unbalanced these overall measures will primarily reflect the degree of agreement between the partitions on the large clusters, and will provide much less information on the agreement on smaller clusters. Furthermore, the value of Rand-like indices is determined to a large extent by the number of pairs of objects that are not joined in either of the partitions.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.2
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available