☆ 4.6 Article

A parallel text document clustering algorithm based on neighbors

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS (2015)

Journal

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS

Volume 18, Issue 2, Pages 933-948

Publisher

SPRINGER

DOI: 10.1007/s10586-015-0450-z

Keywords

Document clustering; Text mining; k-Means; Parallel algorithm; Cluster computing; Performance analysis

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

In this paper, we propose a new parallel algorithm for text document clustering based on the concept of neighbor (Guha et al. in Inf Syst 25(5): 345-366, 2000). If two documents are similar enough, they are considered as neighbors of each other. The new algorithm is named parallel k-means based on neighbors (PKBN), and it is a parallel version of sequential k-means based on neighbors (SKBN) that we proposed in Luo et al. (Data Knowl Eng 68(11): 1271-1288, 2009). PKBN fully exploits the data-parallelism of SKBN and adopts a new parallel pair-generating method to build the neighbor matrix. Our new parallel pair-generating method causes less communication overhead between processors than existing methods. PKBN is designed for message-passing multiprocessor systems and is implemented on a cluster of Linux workstations to analyze its performance. Our experimental results on real-life data sets demonstrate that PKBN is very efficient and has good scalability with respect to the number of processors and the size of data set.

A parallel text document clustering algorithm based on neighbors

Journal

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS

Publisher

SPRINGER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A parallel text document clustering algorithm based on neighbors

Journal

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS

Publisher

SPRINGER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper