☆ 4.5 Article

A k-mean-directions Algorithm for Fast Clustering of Data on the Sphere

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS (2010)

期刊

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS

卷 19, 期 2, 页码 377-396

出版社

TAYLOR & FRANCIS INC

DOI: 10.1198/jcgs.2009.08155

关键词

Directional; Information retrieval; Langevin/von-Mises distribution; MC toolkit; Microarrays; spkmeans

类别

Statistics & Probability

资金

National Science Foundation CAREER [DMS-0437555]
National Institutes of Health [DC-0006740]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

A k-means-type algorithm is proposed for efficiently clustering data constrained to lie on the surface of a p-dimensional unit sphere, or data that are mean-zero-unit-variance standardized observations such as those that occur when using Euclidean distance to cluster time series gene expression data using a correlation metric. We also provide methodology to initialize the algorithm and to estimate the number of clusters in the dataset. Results from a detailed series of experiments show excellent performance, even with very large datasets. The methodology is applied to the analysis of the mitotic cell division cycle of budding yeast dataset of Cho et al. [Molecular Cell (1998), 2, 65-73]. The entire dataset has not been analyzed previously, so our analysis provides an understanding for the complete set of genes acting in concert and differentially. We also use our methodology on the submitted abstracts of oral presentations made at the 2008 Joint Statistical Meetings (JSM) to identify similar topics. Our identified groups are both interpretable and distinct and the methodology provides a possible automated tool for efficient parallel scheduling of presentations at professional meetings. The supplemental materials described in the article are available in the online supplements.

A k-mean-directions Algorithm for Fast Clustering of Data on the Sphere

期刊

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS

出版社

TAYLOR & FRANCIS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A k-mean-directions Algorithm for Fast Clustering of Data on the Sphere

期刊

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS

出版社

TAYLOR & FRANCIS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文