Article
Computer Science, Software Engineering
Lina Gong, Haoxiang Zhang, Jingxuan Zhang, Mingqiang Wei, Zhiqiu Huang
Summary: Software Defect Prediction (SDP) is an important operation to ensure software quality, but class overlap in SDP datasets hinders performance. In this empirical study, we propose an approach to identify overlapping instances and investigate the impact of class overlap on the performance and interpretation of seven SDP models. We find that 70.0% of SDP datasets have overlapping instances and different levels of class overlap affect SDP model performance and feature ranking. Handling class overlap can significantly improve SDP model performance on datasets with over 12.5% overlap ratios.
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING
(2023)
Article
Computer Science, Information Systems
Yuanyuan Ma, Xinquan Yu, Xiangyang Luo, Dong Liu, Yi Zhang
Summary: Feature selection is an essential approach to enhance steganalysis efficiency by removing redundant and useless features. However, it faces bottlenecks of high time cost, poor universality, and dependence on parameter setting due to the diversity of steganalysis features. In this study, an adaptive steganalytic feature selection method based on classification metrics is proposed to address these issues. Experimental results demonstrate that the proposed method achieves competitive performance compared to classic and state-of-the-art feature selection methods in terms of detection accuracy, calculation cost, storage cost, and universality.
INFORMATION SCIENCES
(2023)
Article
Computer Science, Artificial Intelligence
Muhammad Asim, Kashif Javed, Abdur Rehman, Haroon A. Babri
Summary: Pruning of frequent or rare terms in texts is not helpful for text classification, as setting incorrect threshold values may result in loss of useful terms. A new feature ranking metric has been proposed to select the most useful terms and outperforms seven other metrics in text classification performance.
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS
(2021)
Article
Automation & Control Systems
Victor Hamer, Pierre Dupont
Summary: Current feature selection methods, especially in high-dimensional data, may suffer from instability, but a new stability measure proposed in this work, which incorporates the importance of selected features in predictive models, has been shown to correct overly optimistic estimates and improve decision-making accuracy.
JOURNAL OF MACHINE LEARNING RESEARCH
(2021)
Article
Chemistry, Multidisciplinary
Hadeel Alsolai, Marc Roper
Summary: This study empirically evaluates the effectiveness of ensemble models, feature selection, and sampling techniques on predicting change-proneness. The results show that ensemble feature selection and sampling techniques improve prediction accuracy, with random forests performing the best among the investigated models.
APPLIED SCIENCES-BASEL
(2022)
Article
Chemistry, Multidisciplinary
Rasmita Panigrahi, Sanjay Kumar Kuanar, Sanjay Misra, Lov Kumar
Summary: The study aims to develop an ensemble-based refactoring prediction model by identifying the appropriate methods or classes that need to be refactored in object-oriented software. The proposed model uses different feature selection techniques and data sampling techniques to distribute the data uniformly. The experimental results show that the Maximum Voting Ensemble (MVE) performs better in the refactoring prediction model at the class level.
APPLIED SCIENCES-BASEL
(2022)
Article
Multidisciplinary Sciences
Bryan A. Dawkins, Trang T. Le, Brett A. McKinney
Summary: The performance of nearest-neighbor feature selection and prediction methods is influenced by neighborhood computation metrics and data distribution properties. Recent work has focused on improving algorithms through new estimation methods and metrics, but little attention has been paid to the distributional properties of pairwise distances. Analytical formulas for mean and variance of pairwise distances for different data types and metrics have been derived, providing insights into the distance properties commonly used in nearest-neighbor methods.
Article
Computer Science, Artificial Intelligence
Deepak Kumar Rakesh, Raj Anwit, Prasanta K. Jana
Summary: This paper introduces a new frequency-based stability measure called rank stability (RSt), which evaluates feature selection algorithms considering both subsets of features and feature rankings. The proposed measure assesses the variation of feature rankings generated by perturbing the training set. Extensive experiments demonstrate that heterogeneous ensemble techniques outperform traditional feature selection algorithms in terms of the proposed measure and other performance metrics.
Article
Computer Science, Information Systems
Lucija Sikic, Petar Afric, Adrian Satja Kurdija, Marin Silic
Summary: To improve software quality, researchers have proposed new predictive models that incorporate time information from the software change process, which can enhance the stability and performance of existing classification models.
Article
Mathematical & Computational Biology
Iqra Yousaf, Fareeha Anwar, Salma Imtiaz, Ahmad S. S. Almadhor, Farruh Ishmanov, Sung Won Kim
Summary: Software plays a crucial role in healthcare, from booking appointments to treatment and patient care. The development of IoT medical devices is a current focus, allowing for better monitoring of patient health conditions.
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE
(2022)
Article
Mathematical & Computational Biology
Iqra Yousaf, Fareeha Anwar, Salma Imtiaz, Ahmad S. Almadhor, Farruh Ishmanov, Sung Won Kim
Summary: This research proposes a hybrid bug severity prediction model using convolution neural network (CNN) and Harris Hawk optimization (HHO) for IoT medical devices. The model achieved an accuracy of 96.21% on the evaluation dataset.
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE
(2022)
Article
Computer Science, Artificial Intelligence
Luca Ardito, Luca Barbato, Riccardo Coppola, Michele Valsesia
Summary: This study analyzes the characteristics of the Rust programming language using a set of common static software metrics and compares them with other popular languages. The findings suggest that Rust has advantages over C and C++ in certain aspects but performs worse compared to other object-oriented languages. Rust language exhibits average complexity and maintainability when compared with a set of popular languages.
PEERJ COMPUTER SCIENCE
(2021)
Article
Computer Science, Artificial Intelligence
Md Alamgir Kabir, Jacky Keung, Burak Turhan, Kwabena Ebo Bennin
Summary: The study examines the impact of feature selection on the performance of IRDP models and their robustness to concept drift. Experimental results show that using feature selection techniques can significantly improve prediction results, and models trained solely on the most recent release data are not always the best. Training models with carefully selected features can help reduce concept drifts.
APPLIED SOFT COMPUTING
(2021)
Review
Computer Science, Information Systems
Utkarsh Mahadeo Khaire, R. Dhanalakshmi
Summary: Feature selection technique is a tool for understanding problems by analyzing relevant features, which can improve classifier performance and reduce computational load. However, the high correlation between features often leads to instability in traditional feature selection algorithms, resulting in reduced confidence in the selected features. Therefore, achieving high stability in feature selection algorithms is crucial.
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES
(2022)
Article
Computer Science, Artificial Intelligence
Syed Rashid Aziz, Tamim Ahmed Khan, Aamer Nadeem
Summary: The study aims to validate the helpfulness of inheritance metrics in classifying unlabeled datasets and propose a new mechanism to label clusters as faulty or fault-free. Results showed a significant impact of inheritance metrics in SFP, specifically in classifying unlabeled datasets and correctly labeling instances.
PEERJ COMPUTER SCIENCE
(2021)