4.5 Article

Cross-Study Replicability in Cluster Analysis

Journal

STATISTICAL SCIENCE
Volume 38, Issue 2, Pages 303-316

Publisher

INST MATHEMATICAL STATISTICS-IMS
DOI: 10.1214/22-STS871

Keywords

Clustering; replicability; multiple studies

Ask authors/readers for more resources

In cancer research, clustering techniques are widely used for exploratory analyses, playing a critical role in the identification of novel cancer subtypes and patient management. Our paper reviews methods for replicability of clustering analyses and proposes a novel framework for evaluating cross-study clustering replicability. The approach can be applied to any clustering algorithm and can quantify replicability using different measures of similarity between partitions.
In cancer research, clustering techniques are widely used for ex-ploratory analyses, playing a critical role in the identification of novel cancer subtypes and patient management. As data collected by multiple research groups grows, it is increasingly feasible to investigate the replicability of clustering procedures, that is, their ability to consistently recover biologi-cally meaningful clusters across several data sets. In this paper, we review methods for replicability of clustering analyses, and discuss a novel frame-work for evaluating cross-study clustering replicability, useful when two or more studies are available. Our approach can be applied to any clustering al-gorithm and can employ different measures of similarity between partitions to quantify replicability, globally (i.e., for the whole sample) as well as lo-cally (i.e., for individual clusters). Using experiments on synthetic and real gene expression data, we illustrate the usefulness of our procedure to evalu-ate if the same clusters are identified consistently across a collection of data sets.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available