4.7 Article

Ordering-based pruning for improving the performance of ensembles of classifiers in the framework of imbalanced datasets

Journal

INFORMATION SCIENCES
Volume 354, Issue -, Pages 178-196

Publisher

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2016.02.056

Keywords

Imbalanced datasets; Tree-based ensembles; Ordering-based pruning; Bagging; Boosting

Funding

  1. Spanish Ministry of Science and Technology [TIN-2011-28488, TIN2013-40765-P, TIN2014-57251-P]
  2. Andalusian Research Plans [P11-TIC-7765, P10-TIC-6858]
  3. University of Jaen [UJA2014/06/15]
  4. Caja Rural Provincial de Jaen [UJA2014/06/15]

Ask authors/readers for more resources

The scenario of classification with imbalanced datasets has gained a notorious significance in the last years. This is due to the fact that a large number of problems where classes are highly skewed may be found, affecting the global performance of the system. A great number of approaches have been developed to address this problem. These techniques have been traditionally proposed under three different perspectives: data treatment, adaptation of algorithms, and cost-sensitive learning. Ensemble-based models for classifiers are an extension over the former solutions. They consider a pool of classifiers, and they can in turn integrate any of these proposals. The quality and performance of this type of methodology over baseline solutions have been shown in several studies of the specialized literature. The goal of this work is to improve the capabilities of tree-based ensemble-based solutions that were specifically designed for imbalanced classification, focusing on the best behaving bagging- and boosting-based ensembles in this scenario. In order to do so, this paper proposes several new metrics for ordering-based pruning, which are properly adapted to address the skewed-class distribution. From our experimental study we show two main results: on the one hand, the use of the new metrics allows pruning to become a very successful approach in this scenario; on the other hand, the behavior of Under-Bagging model excels, achieving the highest gain with the usage of pruning, since the random undersampled sets that best complement each other can be selected. Accordingly, this scheme is capable of outperforming previous ensemble models selected from the state-of-the-art. (C) 2016 Elsevier Inc. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available