4.7 Article

A semantic term weighting scheme for text categorization

Journal

EXPERT SYSTEMS WITH APPLICATIONS
Volume 38, Issue 10, Pages 12708-12716

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2011.04.058

Keywords

Text categorization; Semantic term weighting; WordNet; TF-IDF

Funding

  1. MOE-MS Key Laboratory of Multimedia Computing and Communication of USTC [07122807]
  2. Natural Science Foundation of China [61073110, 60775037]
  3. National Natural Science Foundation of China [60933013]
  4. Research Fund for the Doctoral Program of Higher Education of China [20093402110017]

Ask authors/readers for more resources

Traditional term weighting schemes in text categorization, such as TF-IDF, only exploit the statistical information of terms in documents. Instead, in this paper, we propose a novel term weighting scheme by exploiting the semantics of categories and indexing terms. Specifically, the semantics of categories are represented by senses of terms appearing in the category labels as well as the interpretation of them by WordNet. Also, the weight of a term is correlated to its semantic similarity with a category. Experimental results on three commonly used data sets show that the proposed approach outperforms TF-IDF in the cases that the amount of training data is small or the content of documents is focused on well-defined categories. In addition, the proposed approach compares favorably with two previous studies. (C) 2011 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available