☆ 4.4 Article

BRACID: a comprehensive approach to learning rules from imbalanced data

JOURNAL OF INTELLIGENT INFORMATION SYSTEMS (2012)

Journal

JOURNAL OF INTELLIGENT INFORMATION SYSTEMS

Volume 39, Issue 2, Pages 335-373

Publisher

SPRINGER

DOI: 10.1007/s10844-011-0193-0

Keywords

Rule induction; Imbalanced data; Classifiers; Nearest neighbour paradigm; Nearest rules

Funding

Ministry of Science and Higher Education [N N519 441939]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

In this paper we consider induction of rule-based classifiers from imbalanced data, where one class (a minority class) is under-represented in comparison to the remaining majority classes. The minority class is usually of primary interest. However, most rule-based classifiers are biased towards the majority classes and they have difficulties with correct recognition of the minority class. In this paper we discuss sources of these difficulties related to data characteristics or to an algorithm itself. Among the problems related to the data distribution we focus on the role of small disjuncts, overlapping of classes and presence of noisy examples. Then, we show that standard techniques for induction of rule-based classifiers, such as sequential covering, top-down induction of rules or classification strategies, were created with the assumption of balanced data distribution, and we explain why they are biased towards the majority classes. Some modifications of rule-based classifiers have been already introduced, but they usually concentrate on individual problems. Therefore, we propose a novel algorithm, BRACID, which more comprehensively addresses the issues associated with imbalanced data. Its main characteristics includes a hybrid representation of rules and single examples, bottom-up learning of rules and a local classification strategy using nearest rules. The usefulness of BRACID has been evaluated in experiments on several imbalanced datasets. The results show that BRACID significantly outperforms the well known rule-based classifiers C4.5rules, RIPPER, PART, CN2, MODLEM as well as other related classifiers as RISE or K-NN. Moreover, it is comparable or better than the studied approaches specialized for imbalanced data such as generalizations of rule algorithms or combinations of SMOTE + ENN preprocessing with PART. Finally, it improves the support of minority class rules, leading to better recognition of the minority class examples.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4

Not enough ratings

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

k-Nearest Neighbour Classifiers - A Tutorial

Padraig Cunningham, Sarah Jane Delany

Summary: The article provides an overview of Nearest Neighbour classification techniques, focusing on similarity assessment mechanisms, computational issues in identifying nearest neighbours, and methods for reducing the dimension of the data. New sections on similarity measures for time-series, retrieval speedup, and intrinsic dimensionality have been added, along with an Appendix containing Python code for key methods.

ACM COMPUTING SURVEYS (2021)