4.6 Article

Using unsupervised clustering approach to train the Support Vector Machine for text classification

Journal

NEUROCOMPUTING
Volume 211, Issue -, Pages 4-10

Publisher

ELSEVIER
DOI: 10.1016/j.neucom.2015.10.137

Keywords

Unsupervised learning; Classification; Support Vector Machines

Ask authors/readers for more resources

The use of learning algorithms for text classification assumes the availability of a large amount of documents which have been organized and labeled correctly by human experts for use in the training phase. Unless the text documents in question have been in existence for some time, using an expert system is inevitable because manual organizing and labeling of thousands of groups of text documents can be a very labor intensive and intellectually challenging activity. Also, in some new domains, the knowledge to organize and label different classes might not be unavailable. Therefore unsupervised learning schemes for automatically clustering data in the training phase are needed. Furthermore, even when knowledge exists, variation is high when the subject under classification depends on personal opinions and is open to different interpretations. This paper describes a methodology which uses Self Organizing Maps (SOM) and alternatively does the automatic clustering by using the Correlation Coefficient (CorrCoef). Consequently the clusters are used as the labels to train the Support Vector Machine (SVM). Experiments and results are presented based on applying the methodology to some standard text datasets in order to verify the accuracy of the proposed scheme. We will also present results which are used to evaluate the effect that dimensionality reduction and changes in the clustering schemes have on the accuracy of the SVM. Results show that the proposed combination has better accuracy compared to training the learning machine using the expert knowledge. (C) 2016 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available