☆ 4.7 Article

A k-means based co-clustering (kCC) algorithm for sparse, high dimensional data

EXPERT SYSTEMS WITH APPLICATIONS (2019)

期刊

EXPERT SYSTEMS WITH APPLICATIONS

卷 118, 期 -, 页码 20-34

出版社

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.eswa.2018.09.006

关键词

Clustering; K-means; Centroid initialization; Co-clustering; Semantic similarity

类别

Computer Science, Artificial Intelligence Engineering, Electrical & Electronic Operations Research & Management Science

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

The k-means algorithm is a widely used method that starts with an initial partitioning of the data and then iteratively converges towards the local solution by reducing the Sum of Squared Errors (SSE). It is known to suffer from the cluster center initialization problem and the iterative step simply (re-)labels the data points based on the initial partition. Most improvements to k-means proposed in the literature focus on the initialization step alone but make no attempt to guide the iterative convergence by exploiting statistical information from the data. Using higher order statistics (such as paths from random walks in a graph) and the duality in the data (as in co-clustering), for instance, are known ways to improve the clustering results. What is unique and significant in our proposed approach is that we embed these concepts into the k-means algorithm rather than just using them as an external distance measure and present a unified framework called the k-means based co-clustering (kCC) Algorithm. The initialization step has been modified to include multiple points to represent each cluster center such that points within a cluster are close together but are far from points representing other clusters. Moreover, neighborhood walk statistics is proposed as a semantic similarity technique for both cluster assignment and center re estimation in the iterative process. The effectiveness of the combined approach is evaluated on several standard data sets. Our results show that kCC performs better as compared to the baseline k-means and other state-of-the-art improvements. (C) 2018 Elsevier Ltd. All rights reserved.

A k-means based co-clustering (kCC) algorithm for sparse, high dimensional data

期刊

EXPERT SYSTEMS WITH APPLICATIONS

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A k-means based co-clustering (kCC) algorithm for sparse, high dimensional data

期刊

EXPERT SYSTEMS WITH APPLICATIONS

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文