4.4 Article

An approach for classification of highly imbalanced data using weighting and undersampling

期刊

AMINO ACIDS
卷 39, 期 5, 页码 1385-1391

出版社

SPRINGER WIEN
DOI: 10.1007/s00726-010-0595-2

关键词

Imbalanced datasets; SVM; Undersampling technique

资金

  1. Agency for Science, Technology, and Research, Singapore (A*Star) [052 101 0020]

向作者/读者索取更多资源

Real-world datasets commonly have issues with data imbalance. There are several approaches such as weighting, sub-sampling, and data modeling for handling these data. Learning in the presence of data imbalances presents a great challenge to machine learning. Techniques such as support-vector machines have excellent performance for balanced data, but may fail when applied to imbalanced datasets. In this paper, we propose a new undersampling technique for selecting instances from the majority class. The performance of this approach was evaluated in the context of several real biological imbalanced data. The ratios of negative to positive samples vary from similar to 9:1 to similar to 100:1. Useful classifiers have high sensitivity and specificity. Our results demonstrate that the proposed selection technique improves the sensitivity compared to weighted support-vector machine and available results in the literature for the same datasets.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据