4.4 Article

BRACID: a comprehensive approach to learning rules from imbalanced data

Journal

JOURNAL OF INTELLIGENT INFORMATION SYSTEMS
Volume 39, Issue 2, Pages 335-373

Publisher

SPRINGER
DOI: 10.1007/s10844-011-0193-0

Keywords

Rule induction; Imbalanced data; Classifiers; Nearest neighbour paradigm; Nearest rules

Funding

  1. Ministry of Science and Higher Education [N N519 441939]

Ask authors/readers for more resources

In this paper we consider induction of rule-based classifiers from imbalanced data, where one class (a minority class) is under-represented in comparison to the remaining majority classes. The minority class is usually of primary interest. However, most rule-based classifiers are biased towards the majority classes and they have difficulties with correct recognition of the minority class. In this paper we discuss sources of these difficulties related to data characteristics or to an algorithm itself. Among the problems related to the data distribution we focus on the role of small disjuncts, overlapping of classes and presence of noisy examples. Then, we show that standard techniques for induction of rule-based classifiers, such as sequential covering, top-down induction of rules or classification strategies, were created with the assumption of balanced data distribution, and we explain why they are biased towards the majority classes. Some modifications of rule-based classifiers have been already introduced, but they usually concentrate on individual problems. Therefore, we propose a novel algorithm, BRACID, which more comprehensively addresses the issues associated with imbalanced data. Its main characteristics includes a hybrid representation of rules and single examples, bottom-up learning of rules and a local classification strategy using nearest rules. The usefulness of BRACID has been evaluated in experiments on several imbalanced datasets. The results show that BRACID significantly outperforms the well known rule-based classifiers C4.5rules, RIPPER, PART, CN2, MODLEM as well as other related classifiers as RISE or K-NN. Moreover, it is comparable or better than the studied approaches specialized for imbalanced data such as generalizations of rule algorithms or combinations of SMOTE + ENN preprocessing with PART. Finally, it improves the support of minority class rules, leading to better recognition of the minority class examples.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Computer Science, Artificial Intelligence

Multi-class and feature selection extensions of Roughly Balanced Bagging for imbalanced data

Mateusz Lango, Jerzy Stefanowski

JOURNAL OF INTELLIGENT INFORMATION SYSTEMS (2018)

Article Computer Science, Information Systems

Visual-based analysis of classification measures and their properties for class imbalanced problems

Dariusz Brzezinski, Jerzy Stefanowski, Robert Susmaga, Izabela Szczech

INFORMATION SCIENCES (2018)

Article Computer Science, Artificial Intelligence

The impact of data difficulty factors on classification of imbalanced and concept drifting data streams

Dariusz Brzezinski, Leandro L. Minku, Tomasz Pewinski, Jerzy Stefanowski, Artur Szumaczuk

Summary: Class imbalance poses additional challenges when learning classifiers from concept drifting data streams. Existing work primarily focuses on addressing global imbalance ratio, while neglecting other data complexities. Independent research on static imbalanced data has emphasized the influential role of local data difficulty factors. Investigating the interactions between concept drifts and local data difficulty factors in concept drifting data streams is crucial, as revealed by our comprehensive study.

KNOWLEDGE AND INFORMATION SYSTEMS (2021)

Article Computer Science, Artificial Intelligence

What makes multi-class imbalanced problems difficult? An experimental study

Mateusz Lango, Jerzy Stefanowski

Summary: This study experimentally investigates the impact of various multi-class imbalanced difficulty factors on the performance of classifiers. The results reveal that class overlapping and class size configurations are important difficulties.

EXPERT SYSTEMS WITH APPLICATIONS (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Quality Versus Speed in Energy Demand Prediction Experience Report from an R&D project

Witold Andrzejewski, Jedrzej Potoniec, Maciej Drozdowski, Jerzy Stefanowski, Robert Wrembel, Pawel Stapf

Summary: Effective heat energy demand prediction is crucial in combined heat power systems. Existing algorithms do not adequately consider computational costs and ease of implementation in industrial systems. This paper proposes two types of algorithms for heat demand prediction and evaluates them experimentally in terms of prediction quality and computational cost.

DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2022, PT I (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Prototypical Convolutional Neural Network for a Phrase-Based Explanation of Sentiment Classification

Kamil Plucinski, Mateusz Lango, Jerzy Stefanowski

Summary: This paper introduces a new prototype-based convolutional neural architecture for text classification, which offers faithful predictions' explanations compared to traditional attention mechanisms. It also demonstrates that dynamic tuning of the number of prototypes can lead to performance gains.

MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021, PT I (2021)

Proceedings Paper Computer Science, Artificial Intelligence

multi-imbalance: Open Source Python Toolbox for Multi-class Imbalanced Classification

Jacek Grycza, Damian Horna, Hanna Klimczak, Mateusz Lango, Kamil Plucinski, Jerzy Stefanowski

Summary: Multi-imbalance is an open-source Python library designed to provide the Python community with tools for handling multi-class imbalanced problems. It includes implementations of binary decomposition techniques, ensembles, and a variety of re-sampling approaches for multi-class imbalanced classification.

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: APPLIED DATA SCIENCE AND DEMO TRACK, ECML PKDD 2020, PT V (2021)

Proceedings Paper Computer Science, Artificial Intelligence

Time Aspect in Making an Actionable Prediction of a Conversation Breakdown

Piotr Janiszewski, Mateusz Lango, Jerzy Stefanowski

Summary: Online harassment is a significant issue in modern societies, often mitigated by the manual work of website moderators and supported by machine learning tools. Previous methods only allow for retrospective detection of online abuse, while proactive approaches have been proposed to help moderators prevent conversation breakdown. This study introduces a new method based on deep neural networks that predicts the likelihood of conversation breakdown and the time remaining until derailment, showing improvement over current state-of-the-art methods.

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: APPLIED DATA SCIENCE TRACK, PT V (2021)

Proceedings Paper Computer Science, Artificial Intelligence

Classification of Multi-class Imbalanced Data: Data Difficulty Factors and Selected Methods for Improving Classifiers

Jerzy Stefanowski

Summary: This paper summarizes the difficulty factors and research status of multiple class imbalanced problem, and presents three new methods for learning classifiers from multi-class imbalanced data.

ROUGH SETS (IJCRS 2021) (2021)

Article Computer Science, Artificial Intelligence

Artificial Intelligence Research Community and Associations in Poland

Grzegorz J. Nalepa, Jerzy Stefanowski

FOUNDATIONS OF COMPUTING AND DECISION SCIENCES (2020)

Article Computer Science, Artificial Intelligence

On the Dynamics of Classification Measures for Imbalanced and Streaming Data

Dariusz Brzezinski, Jerzy Stefanowski, Robert Susmaga, Izabela Szczech

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2020)

Article Automation & Control Systems

USING INFORMATION ON CLASS INTERRELATIONS TO IMPROVE CLASSIFICATION OF MULTICLASS IMBALANCED DATA: A NEW RESAMPLING ALGORITHM

Malgorzata Janicka, Mateusz Lango, Jerzy Stefanowski

INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE (2019)

Proceedings Paper Computer Science, Artificial Intelligence

An Algorithm for Selective Preprocessing of Multi-class Imbalanced Data

Szymon Wojciechowski, Szymon Wilk, Jerzy Stefanowski

PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON COMPUTER RECOGNITION SYSTEMS CORES 2017 (2018)

Proceedings Paper Computer Science, Artificial Intelligence

Tetrahedron: Barycentric Measure Visualizer

Dariusz Brzezinski, Jerzy Stefanowski, Robert Susmaga, Izabela Szczech

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2017, PT III (2017)

Proceedings Paper Computer Science, Artificial Intelligence

Discovering Minority Sub-clusters and Local Difficulty Factors from Imbalance Data

Mateusz Lango, Dariusz Brzezinski, Sebastian Firlik, Jerzy Stefanowski

DISCOVERY SCIENCE, DS 2017 (2017)

No Data Available