4.7 Article

EUSC: A clustering-based surrogate model to accelerate evolutionary undersampling in imbalanced classification

Journal

APPLIED SOFT COMPUTING
Volume 101, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.asoc.2020.107033

Keywords

Data preprocessing; Evolutionary undersampling; Surrogate models; Imbalanced classification; Fitness approximation

Funding

  1. School of Computer Science of the University of Nottingham, United Kingdom

Ask authors/readers for more resources

This study introduces a clustering-based surrogate model to accelerate evolutionary undersampling, saving significant runtime and expanding its applicability to larger datasets while maintaining performance improvements. By focusing on phenotype rather than genotype representation, the proposed approach offers a novel solution to binary optimisation problems in imbalanced datasets.
Learning from imbalanced datasets is highly demanded in real-world applications and a challenge for standard classifiers that tend to be biased towards the classes with the majority of the examples. Undersampling approaches reduce the size of the majority class to balance the class distributions. Evolutionary-based approaches are prominent, treating undersampling as a binary optimisation problem that determines which examples are removed. However, their utilisation is limited to small datasets due to fitness evaluation costs. This work proposes a two-stage clustering-based surrogate model that enables evolutionary undersampling to compute fitness values faster. The main novelty lies in the development of a surrogate model for binary optimisation which is based on the meaning (phenotype) rather than their binary representation (genotype). We conduct an evaluation on 44 imbalanced datasets, showing that in comparison with the original evolutionary undersampling, we can save up to 83% of the runtime without significantly deteriorating the classification performance. Crown Copyright (C) 2020 Published by Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Computer Science, Theory & Methods

d-Choquet integrals: Choquet integrals based on dissimilarities

H. Bustince, R. Mesiar, J. Fernandez, M. Galar, D. Paternain, A. Altalhi, G. P. Dimuro, B. Bedregal, Z. Takac

Summary: The paper introduces a new class of functions called d-Choquet integrals, which are a generalization of the standard Choquet integral by replacing the difference in the definition with a dissimilarity function. Some d-Choquet integrals are aggregation functions, while others are not, and the conditions for this are explored in the study of their properties.

FUZZY SETS AND SYSTEMS (2021)

Article Computer Science, Artificial Intelligence

Attacking Bitcoin anonymity: generative adversarial networks for improving Bitcoin entity classification

Francesco Zola, Lander Segurola-Gil, Jan L. Bruse, Mikel Galar, Raul Orduna-Urrutia

Summary: This work proposes a new method to address the class imbalance problem in Bitcoin entity classification by applying generative adversarial networks (GANs). By generating synthetic data to tackle the imbalance issue, GANs prove to be effective in improving accuracy and performance compared to other data preprocessing techniques.

APPLIED INTELLIGENCE (2022)

Article Computer Science, Information Systems

Network traffic analysis through node behaviour classification: a graph-based approach with temporal dissection and data-level preprocessing

F. Zola, L. Segurola-Gil, J. L. Bruse, M. Galar, R. Orduna-Urrutia

Summary: Network traffic analysis plays a crucial role in cybersecurity by identifying anomalous and potentially dangerous connections. This study proposes a threefold approach, involving temporal dissection, data-level preprocessing, and node behavior classification, to address the challenges of analyzing temporal network traffic data. Experimental results demonstrate that the proposed method effectively reduces class imbalance and improves the performance of supervised node behavior classification, outperforming traditional anomaly detection techniques.

COMPUTERS & SECURITY (2022)

Article Computer Science, Artificial Intelligence

SPMS-ALS: A Single-Point Memetic structure with accelerated local search for instance reduction

Hoang Lam Le, Ferrante Neri, Isaac Triguero

Summary: This paper investigates the optimization of instance reduction, a key stage in data mining, and proposes a Memetic Computing approach called SPMS-ALS. By integrating an Accelerated Local Search within a single-point memetic framework, SPMS-ALS achieves excellent performance while reducing runtime by up to approximately 85%, compared to other algorithms performing the same number of function calls.

SWARM AND EVOLUTIONARY COMPUTATION (2022)

Article Computer Science, Theory & Methods

Discrete IV dG-Choquet integrals with respect to admissible orders

Zdenko Takac, Mikel Uriz, Mikel Galar, Daniel Paternain, Humberto Bustince

Summary: In this work, we introduce the concept of d(G)-Choquet integral, which extends the discrete Choquet integral by incorporating a dissimilarity function to represent input differences and replacing the sum with more general functions. We demonstrate that the discrete Choquet integral and the d-Choquet integral are specific cases of the d(G)-Choquet integral. We also define interval-valued fuzzy measures and show their application in defining a monotonic interval-valued discrete Choquet integral using d(G)-Choquet integrals. The validity of this interval-valued Choquet integral is studied through an illustrative example in a classification problem.

FUZZY SETS AND SYSTEMS (2022)

Article Computer Science, Information Systems

A supervised fuzzy measure learning algorithm for combining classifiers

Mikel Uriz, Daniel Paternain, Humberto Bustince, Mikel Galar

Summary: Fuzzy measure-based aggregations consider interactions among input source coalitions, but defining the fuzzy measure is a challenge. This paper proposes a new algorithm for learning fuzzy measure that can optimize any cost function, using advancements from deep learning frameworks. Experimental study with 58 datasets shows the effectiveness of the proposed method in optimizing cross-entropy cost for binary and multi-class classification problems, compared to other state-of-the-art methods for fuzzy measure learning.

INFORMATION SCIENCES (2023)

Article Physics, Applied

Enhancing the quality of amplitude patterns using time-multiplexed virtual acoustic fields

Sonia Elizondo, Inigo Ezcurdia, Jaime Goni, Mikel Galar, Asier Marzo

Summary: Ultrasonic fields have various functions and limitations in creating dynamic amplitude patterns. This study demonstrates how the average of multiple time-multiplexed amplitude fields improves pattern resolution and optimizes the nonlinear problem of decomposing a target amplitude field. The technique has the potential to enhance the quality of existing setups without modifying the equipment, benefiting bio-printing, haptic devices, and ultrasonic medical treatments.

APPLIED PHYSICS LETTERS (2023)

Proceedings Paper Computer Science, Artificial Intelligence

Gender Stereotyping Impact in Facial Expression Recognition

Iris Dominguez-Catena, Daniel Paternain, Mikel Galar

Summary: Facial Expression Recognition (FER) uses images of faces to identify the emotional state of users, allowing for a closer interaction between humans and autonomous systems. Machine learning-based models have become popular in FER but are prone to demographic bias issues. This study demonstrates the impact of gender bias in FER datasets and highlights the need for a thorough bias analysis, as global demographic balance can hide other harmful biases.

MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT I (2023)

Proceedings Paper Computer Science, Software Engineering

Verification system based on long-range iris and Graph Siamese Neural Networks

Francesco Zola, Jose Alvaro Fernandez-Carrasco, Jan Lukas Bruse, Mikel Galar, Zeno Geradts

Summary: This study proposes a novel approach that utilizes long-range images for implementing an iris verification system and uses Graph Siamese Neural Networks to predict whether they belong to the same person. The research not only describes the methodology but also evaluates the application of spectral components in improving graph extraction and classification tasks.

PROCEEDINGS OF 2022 THE 3RD EUROPEAN SYMPOSIUM ON SOFTWARE ENGINEERING, ESSE 2022 (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Accelerated Pattern Search with Variable Solution Size for Simultaneous Instance Selection and Generation

Hoang Lam Le, Ferrante Neri, Dario Landa-Silva, Isaac Triguero

Summary: This paper investigates a fast optimization approach for instance reduction in data science, considering both instance selection and instance generation stages. The proposed method, named APS-VSS, uses a variable solution size, accelerated objective function computation, and a single-point memetic structure for instance generation. The experiment results show that APS-VSS outperforms existing algorithms and is competitive in terms of accuracy and reduction rates, while significantly reducing the runtime.

PROCEEDINGS OF THE 2022 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2022 (2022)

Proceedings Paper Geosciences, Multidisciplinary

PUSHING THE LIMITS OF SENTINEL-2 FOR BUILDING FOOTPRINT EXTRACTION

C. Ayala, C. Aranda, M. Galar

Summary: Building footprint maps are important but difficult to maintain. This study proposes a novel deep learning architecture to accurately extract building footprints from high resolution satellite imagery, bridging the gap between satellite and aerial semantic segmentation.

2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022) (2022)

Proceedings Paper Geography, Physical

MULTI-TEMPORAL DATA AUGMENTATION FOR HIGH FREQUENCY SATELLITE IMAGERY: A CASE STUDY IN SENTINEL-1 AND SENTINEL-2 BUILDING AND ROAD SEGMENTATION

C. Ayala, C. Aranda, M. Galar

Summary: Semantic segmentation of remote sensing images is important in various practical applications, but deep learning models require a large amount of labeled data to handle unseen scenarios. This paper proposes a novel realistic multi-temporal color data augmentation technique and evaluates it in building and road semantic segmentation tasks.

XXIV ISPRS CONGRESS: IMAGING TODAY, FORESEEING TOMORROW, COMMISSION III (2022)

Proceedings Paper Computer Science, Artificial Intelligence

A Scalable and Flexible Open Source Big Data Architecture for Small and Medium-Sized Enterprises

Luis Iniguez, Mikel Galar

Summary: The advancements in Big Data, Internet of Things, and Artificial Intelligence are driving the industrial revolution known as Industry 4.0. However, implementing Industry 4.0 in automated factories comes with challenges such as lack of infrastructure, financial limitations, coordination problems, and a low understanding of its implications. Many implementations focus on specific problems, leading to continuous restructuring and increased costs. To make Industry 4.0 affordable for Small and Medium-sized Enterprises (SMEs), it is necessary to create flexible and scalable Big Data architectures that take these difficulties into account.

16TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING MODELS IN INDUSTRIAL AND ENVIRONMENTAL APPLICATIONS (SOCO 2021) (2022)

Article Computer Science, Artificial Intelligence

Style linear k-nearest neighbor classification method

Jin Zhang, Zekang Bian, Shitong Wang

Summary: This study proposes a novel style linear k-nearest neighbor method to extract stylistic features using matrix expressions and improve the generalizability of the predictor through style membership vectors.

APPLIED SOFT COMPUTING (2024)

Article Computer Science, Artificial Intelligence

A dimensionality reduction method for large-scale group decision-making using TF-IDF feature similarity and information loss entropy

Qifeng Wan, Xuanhua Xu, Jing Han

Summary: In this study, we propose an innovative approach for dimensionality reduction in large-scale group decision-making scenarios that targets linguistic preferences. The method combines TF-IDF feature similarity and information loss entropy to address challenges in decision-making with a large number of decision makers.

APPLIED SOFT COMPUTING (2024)

Article Computer Science, Artificial Intelligence

Frequency-based methods for improving the imperceptibility and transferability of adversarial examples

Hegui Zhu, Yuchen Ren, Chong Liu, Xiaoyan Sui, Libo Zhang

Summary: This paper proposes an adversarial attack method based on frequency information, which optimizes the imperceptibility and transferability of adversarial examples in white-box and black-box scenarios respectively. Experimental results validate the superiority of the proposed method and its application in real-world online model evaluation reveals their vulnerability.

APPLIED SOFT COMPUTING (2024)

Article Computer Science, Artificial Intelligence

Consensus-based generalized TODIM approach for occupational health and safety risk analysis with opinion interactions

Jing Tang, Xinwang Liu, Weizhong Wang

Summary: This paper proposes a hybrid generalized TODIM approach in the Fine-Kinney framework to evaluate occupational health and safety hazards. The approach integrates CRP, dynamic SIN, and PLTSs to handle opinion interactions and incomplete opinions among decision makers. The efficiency and rationality of the proposed approach are demonstrated through a numerical example, comparison, and sensitivity studies.

APPLIED SOFT COMPUTING (2024)

Article Computer Science, Artificial Intelligence

Deep Q-network-based heuristic intrusion detection against edge-based SIoT zero-day attacks

Shigen Shen, Chenpeng Cai, Zhenwei Li, Yizhou Shen, Guowen Wu, Shui Yu

Summary: To address the damage caused by zero-day attacks on SIoT systems, researchers propose a heuristic learning intrusion detection system named DQN-HIDS. By integrating Deep Q-Networks (DQN) into the system, DQN-HIDS gradually improves its ability to identify malicious traffic and reduces resource workloads. Experiments demonstrate the superior performance of DQN-HIDS in terms of workload, delayed sample queue, rewards, and classifier accuracy.

APPLIED SOFT COMPUTING (2024)

Article Computer Science, Artificial Intelligence

A Chinese text classification based on active

Song Deng, Qianliang Li, Renjie Dai, Siming Wei, Di Wu, Yi He, Xindong Wu

Summary: In this paper, we propose a Chinese text classification algorithm based on deep active learning for the power system, which addresses the challenge of specialized text classification. By applying a hierarchical confidence strategy, our model achieves higher classification accuracy with fewer labeled training data.

APPLIED SOFT COMPUTING (2024)

Article Computer Science, Artificial Intelligence

Ranking intuitionistic fuzzy sets with hypervolume-based approach: An application for multi-criteria assessment of energy alternatives

Kaan Deveci, Onder Guler

Summary: This study proves the lack of robustness in nonlinear IF distance functions for ranking intuitionistic fuzzy sets (IFS) and proposes an alternative ranking method based on hypervolume metric. Additionally, the suggested method is extended as a new multi-criteria decision making method called HEART, which is applied to evaluate Turkey's energy alternatives.

APPLIED SOFT COMPUTING (2024)

Article Computer Science, Artificial Intelligence

Improved energy management of chiller system with AI-based regression

Fu-Wing Yu, Wai-Tung Ho, Chak-Fung Jeff Wong

Summary: This research aims to enhance the energy management in commercial building air-conditioning systems, specifically focusing on chillers. Ridge regression is found to outperform lasso and elastic net regression when optimized with the appropriate hyperparameter, making it the most suitable method for modeling the system coefficient of performance (SCOP). The key variables that strongly influence SCOP include part load ratios, the operating numbers of chillers and pumps, and the temperatures of chilled water and condenser water. Additionally, July is identified as the month with the highest potential for performance improvement. This study introduces a novel approach that balances feature selection, model accuracy, and optimal tuning of hyperparameters, highlighting the significance of a generic and simplified chiller system model in evaluating energy management opportunities for sustainable operation. The findings from this research can guide future efforts towards more energy-efficient and sustainable operations in commercial buildings.

APPLIED SOFT COMPUTING (2024)

Article Computer Science, Artificial Intelligence

Three-dimension object detection and forward-looking control strategy for non-destructive grasp of thin-skinned fruits

Xiaoyan Chen, Yilin Sun, Qiuju Zhang, Xuesong Dai, Shen Tian, Yongxin Guo

Summary: In this study, a method for dynamically non-destructive grasping of thin-skinned fruits is proposed. It utilizes a multi-modal depth fusion convolutional neural network for image processing and segmentation, and combines the evaluation mechanism of optimal grasping stability and the forward-looking non-destructive grasp control algorithm. The proposed method greatly improves the comprehensive performance of grasping delicate fruits using flexible hands.

APPLIED SOFT COMPUTING (2024)

Article Computer Science, Artificial Intelligence

Siamese learning based on graph differential equation for Next-POI recommendation

Yuxuan Yang, Siyuan Zhou, He Weng, Dongjing Wang, Xin Zhang, Dongjin Yu, Shuiguang Deng

Summary: The study proposes a novel model, POIGDE, which addresses the challenges of data sparsity and elusive motives by solving graph differential equations to capture continuous variation of users' interests. The model learns interest transference dynamics using a time-serial graph and an interval-aware attention mechanism, and applies Siamese learning to directly learn from label representations for predicting future POI visits. The model outperforms state-of-the-art models on real-world datasets, showing potential in the POI recommendation domain.

APPLIED SOFT COMPUTING (2024)

Article Computer Science, Artificial Intelligence

An adaptive data compression technique based on optimal thresholding using multi-objective PSO algorithm for power system data

S. Karthika, P. Rathika

Summary: The widespread development of monitoring devices in the power system has generated a large amount of power consumption data. Storing and transmitting this data has become a significant challenge. This paper proposes an adaptive data compression algorithm based on the discrete wavelet transform (DWT) for power system applications. It utilizes multi-objective particle swarm optimization (MO-PSO) to select the optimal threshold. The algorithm has been tested and outperforms other existing algorithms.

APPLIED SOFT COMPUTING (2024)

Article Computer Science, Artificial Intelligence

Adaptive SV-Borderline SMOTE-SVM algorithm for imbalanced data classification

Jiaqi Guo, Haiyan Wu, Xiaolei Chen, Weiguo Lin

Summary: In this study, an adaptive SV-Borderline SMOTE-SVM algorithm is proposed to address the challenge of imbalanced data classification. The algorithm maps the data into kernel space using SVM and identifies support vectors, then generates new samples based on the neighbors of these support vectors. Extensive experiments show that this method is more effective than other approaches in imbalanced data classification.

APPLIED SOFT COMPUTING (2024)

Article Computer Science, Artificial Intelligence

HilbertSCNet: Self-attention networks for small target segmentation of aerial drone images

Qiumei Zheng, Linkang Xu, Fenghua Wang, Yongqi Xu, Chao Lin, Guoqiang Zhang

Summary: This paper proposes a new semantic segmentation network model called HilbertSCNet, which combines the Hilbert curve traversal and the dual pathway idea to design a new spatial computation module to address the problem of loss of information for small targets in high-resolution images. The experiments show that the proposed network performs well in the segmentation of small targets in high-resolution maps such as drone aerial photography.

APPLIED SOFT COMPUTING (2024)

Article Computer Science, Artificial Intelligence

A comprehensive state-of-the-art survey on the recent modified and hybrid analytic hierarchy process approaches

Mojtaba Ashour, Amir Mahdiyar

Summary: Analytic Hierarchy Process (AHP) is a widely applied technique in multi-criteria decision-making problems, but the sheer number of AHP methods presents challenges for scholars and practitioners in selecting the most suitable method. This paper reviews articles published between 2010 and 2023 proposing hybrid, improved, or modified AHP methods, classifies them based on their contributions, and provides a comprehensive summary table and roadmap to guide the method selection process.

APPLIED SOFT COMPUTING (2024)

Review Computer Science, Artificial Intelligence

A systematic review of metaheuristic algorithms in electric power systems optimization

Gerardo Humberto Valencia-Rivera, Maria Torcoroma Benavides-Robles, Alonso Vela Morales, Ivan Amaya, Jorge M. Cruz-Duarte, Jose Carlos Ortiz-Bayliss, Juan Gabriel Avina-Cervantes

Summary: Electric power system applications are complex optimization problems. Most literature reviews focus on studying electrical paradigms using different optimization techniques, but there is a lack of review on Metaheuristics (MHs) in these applications. Our work provides an overview of the paradigms underlying such applications and analyzes the most commonly used MHs and their search operators. We also discover a strong synergy between the Renewable Energies paradigm and other paradigms, and a significant interest in Load-Forecasting optimization problems. Based on our findings, we provide helpful recommendations for current challenges and potential research paths to support further development in this field.

APPLIED SOFT COMPUTING (2024)