☆ 4.6 Article

A parallel feature selection method study for text classification

NEURAL COMPUTING & APPLICATIONS (2017)

Journal

NEURAL COMPUTING & APPLICATIONS

Volume 28, Issue -, Pages S513-S524

Publisher

SPRINGER LONDON LTD

DOI: 10.1007/s00521-016-2351-3

Keywords

Text classification; Feature selection; MapReduce; Mutual information

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Text classification is a popular research topic in data mining. Many classification methods have been proposed. Feature selection is an important technique for text classification since it is effective in reducing dimensionality, removing irrelevant data, increasing learning accuracy, and improving result comprehensibility. In recent years, data have become increasingly larger in both the number of instances and the number of features in many applications. As a result, classical feature selection methods do not work well in processing large-scale dataset due to the expensive computational cost. To address this issue, in this paper, a parallel feature selection method based on MapReduce is proposed. Specifically, mutual information based on Renyi entropy is used to measure the relationship between feature variables and class variables. Maximum mutual information theory is then employed to choose the most informative combination of feature variables. We implemented the selection process based on MapReduce, which is efficient and scalable for large-scale problems. At last, a practical example well demonstrates the efficiency of the proposed method.Y

A parallel feature selection method study for text classification

Journal

NEURAL COMPUTING & APPLICATIONS

Publisher

SPRINGER LONDON LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A parallel feature selection method study for text classification

Journal

NEURAL COMPUTING & APPLICATIONS

Publisher

SPRINGER LONDON LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper