☆ 4.4 Article

Hybrid supervised clustering based ensemble scheme for text classification

KYBERNETES (2017)

期刊

KYBERNETES

卷 46, 期 2, 页码 330-348

出版社

EMERALD GROUP PUBLISHING LTD

DOI: 10.1108/K-10-2016-0300

关键词

Diversity; Text classification; Classifier ensemble; Supervised clustering

类别

Computer Science, Cybernetics

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Purpose - The immense quantity of available unstructured text documents serve as one of the largest source of information. Text classification can be an essential task for many purposes in information retrieval, such as document organization, text filtering and sentiment analysis. Ensemble learning has been extensively studied to construct efficient text classification schemes with higher predictive performance and generalization ability. The purpose of this paper is to provide diversity among the classification algorithms of ensemble, which is a key issue in the ensemble design. Design/methodology/approach - An ensemble scheme based on hybrid supervised clustering is presented for text classification. In the presented scheme, supervised hybrid clustering, which is based on cuckoo search algorithm and k-means, is introduced to partition the data samples of each class into clusters so that training subsets with higher diversities can be provided. Each classifier is trained on the diversified training subsets and the predictions of individual classifiers are combined by the majority voting rule. The predictive performance of the proposed classifier ensemble is compared to conventional classification algorithms (such as Naive Bayes, logistic regression, support vector machines and C4.5 algorithm) and ensemble learning methods (such as AdaBoost, bagging and random subspace) using 11 text benchmarks. Findings - The experimental results indicate that the presented classifier ensemble outperforms the conventional classification algorithms and ensemble learning methods for text classification. Originality/value - The presented ensemble scheme is the first to use supervised clustering to obtain diverse ensemble for text classification

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

主要评分

4.4

评分不足

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

Uncertainty-driven ensemble classification exploiting unlabeled data

Samia Boukir

Summary: This study investigates the use of margin and diversity, two key concepts in ensemble learning, to develop a versatile uncertainty-driven ensemble classifier under the scarcity of labeled data. New semi-supervised definitions are proposed for both margin and diversity, and new robust ensemble metrics are introduced to strengthen the semi-supervised classification scheme. The relevance of these new criteria is examined in change detection experiments, and the underlying fusion rule significantly improves the change detection performance.

KNOWLEDGE-BASED SYSTEMS (2023)