Journal
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS
Volume 18, Issue 2, Pages 933-948Publisher
SPRINGER
DOI: 10.1007/s10586-015-0450-z
Keywords
Document clustering; Text mining; k-Means; Parallel algorithm; Cluster computing; Performance analysis
Ask authors/readers for more resources
In this paper, we propose a new parallel algorithm for text document clustering based on the concept of neighbor (Guha et al. in Inf Syst 25(5): 345-366, 2000). If two documents are similar enough, they are considered as neighbors of each other. The new algorithm is named parallel k-means based on neighbors (PKBN), and it is a parallel version of sequential k-means based on neighbors (SKBN) that we proposed in Luo et al. (Data Knowl Eng 68(11): 1271-1288, 2009). PKBN fully exploits the data-parallelism of SKBN and adopts a new parallel pair-generating method to build the neighbor matrix. Our new parallel pair-generating method causes less communication overhead between processors than existing methods. PKBN is designed for message-passing multiprocessor systems and is implemented on a cluster of Linux workstations to analyze its performance. Our experimental results on real-life data sets demonstrate that PKBN is very efficient and has good scalability with respect to the number of processors and the size of data set.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available