☆ 4.7 Article

A two-stage feature selection method for text categorization

COMPUTERS & MATHEMATICS WITH APPLICATIONS (2011)

Journal

COMPUTERS & MATHEMATICS WITH APPLICATIONS

Volume 62, Issue 7, Pages 2793-2800

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.camwa.2011.07.045

Keywords

Feature selection; Text categorization; Latent semantic indexing; Support vector machine

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Feature selection for text categorization is a well-studied problem and its goal is to improve the effectiveness of categorization, or the efficiency of computation, or both. The system of text categorization based on traditional term-matching is used to represent the vector space model as a document; however, it needs a high dimensional space to represent the document, and does not take into account the semantic relationship between terms, which leads to a poor categorization accuracy. The latent semantic indexing method can overcome this problem by using statistically derived conceptual indices to replace the individual terms. With the purpose of improving the accuracy and efficiency of categorization, in this paper we propose a two-stage feature selection method. Firstly, we apply a novel feature selection method to reduce the dimension of terms; and then we construct a new semantic space, between terms, based on the latent semantic indexing method. Through some applications involving the spam database categorization, we find that our two-stage feature selection method performs better. (C) 2011 Elsevier Ltd. All rights reserved.

A two-stage feature selection method for text categorization

Journal

COMPUTERS & MATHEMATICS WITH APPLICATIONS

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A two-stage feature selection method for text categorization

Journal

COMPUTERS & MATHEMATICS WITH APPLICATIONS

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper