4.5 Article

Redundancy-driven modified Tomek-link based undersampling: A solution to class imbalance

期刊

PATTERN RECOGNITION LETTERS
卷 93, 期 -, 页码 3-12

出版社

ELSEVIER
DOI: 10.1016/j.patrec.2016.10.006

关键词

Data mining; Class imbalance; Tomek-link; Redundancy; Outlier; Contribution factor

向作者/读者索取更多资源

Class imbalance can be defined as a span among data mining, machine learning and pattern recognition domains that provides to learn from a data-space having unequal class distribution. Common classifiers when trained by imbalanced data tend to bias towards the class possessing bulk instances causing misclassification of upcoming patterns/instances. The study reveals that presence of redundant borderline instances and outliers in the data-space severely catalyzes the effect of class imbalance. The Condensed Nearest Neighbor and Tomek-link undersampling techniques are used as the baseline systems for the present study, and an improved undersampling algorithm is proposed to be employed in the preprocessing stage by amalgamating aspects of outlier and redundancy detection to the baseline system. The proposed scheme imparts to detect outlier, redundant and noisy instances having least contribution in estimating accurate class labels. Thus, a data-level solution has been offered to the concerned problem with novelty in effective elimination of majority instances without losing valuable information. The proposed scheme is implemented and validated with Back Propagation Neural Network (BPNN), K-Nearest-Neighbor (K-NN), Support Vector Machine (SVM) and Naive Bayes classifiers for 10 real-life datasets. The experimental results obtained clearly manifest the superiority of the proposed scheme over the baseline schemes. (C) 2016 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据