Journal
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
Volume 28, Issue 8, Pages 2173-2186Publisher
IEEE COMPUTER SOC
DOI: 10.1109/TKDE.2016.2551240
Keywords
Clustering; external validity index; cluster validation; comparing clusterings; normalization; correction for chance; adjustment for chance
Categories
Funding
- MOPIS
- Nokia Foundation
Ask authors/readers for more resources
Comparing two clustering results of a data set is a challenging task in cluster analysis. Many external validity measures have been proposed in the literature. A good measure should be invariant to the changes of data size, cluster size, and number of clusters. We give an overview of existing set matching indexes and analyze their properties. Set matching measures are based on matching clusters from two clusterings. We analyze the measures in three parts: 1) cluster similarity, 2) matching, and 3) overall measurement. Correction for chance is also investigated and we prove that normalized mutual information and variation of information are intrinsically corrected. We propose a new scheme of experiments based on synthetic data for evaluation of an external validity index. Accordingly, popular external indexes are evaluated and compared when applied to clusterings of different data size, cluster size, and number of clusters. The experiments show that set matching measures are clearly better than the other tested. Based on the analytical comparisons, we introduce a new index called Pair Sets Index (PSI).
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available