4.1 Article

Examining the performance of kernel methods for software defect prediction based on support vector machine

期刊

SCIENCE OF COMPUTER PROGRAMMING
卷 226, 期 -, 页码 -

出版社

ELSEVIER
DOI: 10.1016/j.scico.2022.102916

关键词

Software defect prediction; Kernel functions; Support vector machine; Information gain

向作者/读者索取更多资源

This study examines the impact and stability of four kernel functions (including linear and non-linear) with feature selection on the performance of SVM for software defect prediction. The findings show that only RBF outperforms linear kernel function and is more effective for datasets with high imbalance ratios. Different feature subsets affect the performances of all kernel functions, but using the top 40% of features yields the best results. Therefore, it is recommended to use SVM with RBF kernel for defect datasets.
Support Vector Machine (SVM) has been widely used to build software defect prediction models. Prior studies compared the accuracy of SVM to other machine learning algorithms but arrives at contradictory conclusions due to the use of different choices of kernel functions and metrics. Such a contradictory conclusion raises an important question about the performance of kernel functions, across different experimental conditions. To this end, the present study examines the impact and stability of four kernel functions with feature selection on the performance of SVM for software defect prediction. Strictly speaking, we examine the performance of nonlinear kernel functions against linear kernel function based on different experimental parameters such as data granularity, imbalance ratio of the dataset, and feature subsets. A large-scale study has been conducted using four kernel functions, ten feature subset selection thresholds based on the Information gain algorithm, 38 public datasets and one evaluation measure. This has resulted in 1520 experiments. The findings demonstrate that: 1) Not all nonlinear kernel functions significantly outperform linear, only RBF surpasses linear and other nonlinear kernel functions. 2) We don't have significant difference between kernel functions w.r.t. data granularity, we only found significant difference between RBF and other kernel function based on 'function' data granularity. 3) we also found that RBF can work significantly better than linear and other nonlinear function over datasets with very high and high imbalance ratios. 4) The performances of all kernel functions fluctuate over different feature subsets; However, using top 40% of the features would work best with all kernel functions. To conclude, we can recommend using SVM with RBF kernel for defects datasets because the performance of other kernel functions is limited.(c) 2022 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.1
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Computer Science, Software Engineering

Transformed k-nearest neighborhood output distance minimization for predicting the defect density of software projects

Cuauhtemoc Lopez-Martin, Yenny Villuendas-Rey, Mohammad Azzeh, Ali Bou Nassif, Shadi Banitaan

JOURNAL OF SYSTEMS AND SOFTWARE (2020)

Review Computer Science, Software Engineering

Predicting software effort from use case points: A systematic review

Mohammad Azzeh, Ali Bou Nassif, Imtinan Basem Attili

Summary: The use of Use Case Points (UCP) method for predicting software project effort is gaining popularity. However, the area has not been systematically reviewed, highlighting the need for a literature review to guide and support effort estimation research. The current study aims to classify and analyze UCP effort estimation papers based on various criteria and perspectives, to explore accuracy, estimation context, and the impact of combined techniques on UCP accuracy.

SCIENCE OF COMPUTER PROGRAMMING (2021)

Article Computer Science, Software Engineering

Empirical analysis on productivity prediction and locality for use case points method

Mohammad Azzeh, Ali Bou Nassif, Cuauhtemoc Lopez-Martin

Summary: This paper investigates the impact of data locality approaches on productivity and effort prediction based on multiple UCP variables. It also explores the relationship between productivity and other UCP variables.

SOFTWARE QUALITY JOURNAL (2021)

Article Computer Science, Software Engineering

Locally weighted regression with different kernel smoothers for software effort estimation

Yousef Alqasrawi, Mohammad Azzeh, Yousef Elsheikh

Summary: Estimating software effort has been a challenge, and this paper introduces a more sophisticated locality approach called Locally Weighted Regression (LWR) to learn from local data and build estimation models with multiple local regression models.

SCIENCE OF COMPUTER PROGRAMMING (2022)

Article Computer Science, Software Engineering

On the value of project productivity for early effort estimation

Mohammad Azzeh, Ali Bou Nassif, Yousef Elsheikh, Lefteris Angelis

Summary: Estimating software effort using Use Case Points involves considering productivity, with challenges in predicting it and the aid of historical data. Learning productivity from historical data is more effective than traditional methods, and environmental factors may not accurately predict productivity.

SCIENCE OF COMPUTER PROGRAMMING (2022)

Article Computer Science, Information Systems

On obtaining a stable vote ranking methodology for implementing e-government strategies

Yousef Elsheikh, Yousef Alqasrawi, Mohammad Azzeh

Summary: This study addresses the issue of selecting and prioritizing strategies for the successful implementation of e-government programs. By using different ranking methods and measurement criteria, the study highlights the importance of obtaining stable rankings in selecting the best strategies.

JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES (2022)

Article Computer Science, Information Systems

An optimized case-based software project effort estimation using genetic algorithm

Shaima Hameed, Yousef Elsheikh, Mohammad Azzeh

Summary: Software development companies have faced long-standing challenges in accurately estimating the effort required for software projects. However, research has shown that machine learning techniques, such as case-based reasoning, can improve accuracy. The case-based reasoning technique, though effective, has difficulty in tuning its multiple parameters. This paper proposes the use of a genetic algorithm to find the best combination of parameters and improve accuracy. The results show the effectiveness of this approach, which is beneficial for project managers in financial planning and cost control.

INFORMATION AND SOFTWARE TECHNOLOGY (2023)

Article Biology

Explainable artificial intelligence model for identifying COVID-19 gene biomarkers

Fatma Hilal Yagin, Ipek Balikci Cicek, Abedalrhman Alkhateeb, Burak Yagin, Cemil Colak, Mohammad Azzeh, Sami Akbulut

Summary: This study presents a model that applies explainable artificial intelligence (XAI) methods to assist in diagnosing COVID-19. By using machine learning techniques on COVID-19 metagenomic next-generation sequencing (mNGS) samples and combining LIME and SHAP for explanations, the model successfully predicts COVID-19 and identifies biomarker candidate genes associated with the disease.

COMPUTERS IN BIOLOGY AND MEDICINE (2023)

Article Computer Science, Software Engineering

A soft computing approach for software defect density prediction

Mohammad Azzeh, Yousef Alqasrawi, Yousef Elsheikh

Summary: Defect density is crucial for software testing and maintenance, used to distribute limited human resources effectively. We propose a new prediction model that integrates gray system theory and fuzzy logic to handle uncertainty in software measurement. The model's performance was validated against public defect datasets, outperforming other prediction models, especially with high sparsity ratios.

JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS (2023)

Article Multidisciplinary Sciences

Toward Fluent Arabic Poem Generation Based on Fine-tuning AraGPT2 Transformer

Omar Abboushi, Mohammad Azzeh

Summary: This study fine-tuned AraGPT2, the most advanced Arabic pre-trained transformer, on a large poetry corpus to generate Arabic poems with specific meter and rhyme. The results showed high-quality generated poems according to standard evaluation and expert evaluation. However, concerns were raised regarding potential misuse of this technology.

ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING (2023)

Article Operations Research & Management Science

Examining stability of machine learning methods for predicting dementia at early phases of the disease

Sinan Faouri, Mahmood AlBashayreh, Mohammad Azzeh

Summary: This study investigates the stability of machine learning algorithms for dementia prediction. Through numerous experiments, it is found that support vector machine and Naive Bayes are the most stable algorithms, and using Information Gain (IG) appears to be more effective than using Principal Component Analysis (PCA) for predicting dementia.

DECISION SCIENCE LETTERS (2022)

Article Computer Science, Information Systems

Software Defect Density Prediction Using Deep Learning

Firas Alghanim, Mohammad Azzeh, Ammar El-Hassan, Hazem Qattous

Summary: Delivering a reliable and high-quality software system to clients is a challenging task, and defect density is a key measure of system quality. However, predicting defect density before testing the modules is time-consuming. To address this issue, managers can build prediction models using deep learning to detect defective modules, thus reducing testing costs and improving resource utilization. Our study demonstrates that deep learning is effective in handling sparse data and outperforms other machine learning methods in datasets with high and very high sparsity ratios, while also being a competitive choice for datasets with medium or low sparsity ratios.

IEEE ACCESS (2022)

Article Automation & Control Systems

An Interactive Automation for Human Biliary Tree Diagnosis Using Computer Vision

Mohammad AL-Oudat, Saleh Alomari, Hazem Qattous, Mohammad Azzeh, Tariq AL-Munaize

Summary: The study introduces a vision-based model for initial diagnosis of the biliary tree, utilizing different image processing methods to segment MRI images and extract features to determine patients' health conditions. The research, using a database of 200 MRI images, demonstrates the effectiveness of extracted features with various classifiers.

INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL (2021)

Article Automation & Control Systems

Application of Machine Learning for Online Reputation Systems

Ahmad Alqwadri, Mohammad Azzeh, Fadi Almasalha

Summary: This paper proposes a new reputation system using machine learning to predict consumer reliability and compute product reputation score. The model is evaluated on MovieLens benchmarking datasets and compared to previous rating aggregation models, showing promising results and potential as a solution for reputation systems. The proposed approach can be integrated with online recommendation systems to enhance user experience in online shopping markets.

INTERNATIONAL JOURNAL OF AUTOMATION AND COMPUTING (2021)

Article Computer Science, Software Engineering

A formal approach for the correct deployment of cloud applications

Amel Mammar, Meriem Belguidoum, Saddam Hocine Hiba

Summary: This paper introduces a formal EVENT-B-based approach for modeling and verifying the deployment of component-based applications. By gradually refining an abstract model, a precise specification is built, and mathematical reasoning is used to prove its correctness. The presented approach validates the deployment in a cloud environment using PROB and ensures the construction of a correct system that meets the constraints.

SCIENCE OF COMPUTER PROGRAMMING (2024)

Article Computer Science, Software Engineering

Enhancing test reuse with GUI events deduplication and adaptive semantic matching

Shuqi Liu, Yu Zhou, Longbing Ji, Tingting Han, Taolue Chen

Summary: In this paper, we propose a framework that combines GUI events deduplication with an adaptive semantic matching strategy to enhance the usability of reused tests. Experimental evaluation demonstrates that the framework improves widget mapping performance, significantly reduces event redundancy, and reduces the manual effort of creating tests for similar applications.

SCIENCE OF COMPUTER PROGRAMMING (2024)

Article Computer Science, Software Engineering

A method of test case set generation in the commutativity test of reduce functions

Xiangyu Mu, Lei Liu, Peng Zhang, Jingyao Li, Hui Li

Summary: The aim of this study is to reduce the size of the test case set required to detect the commutativity problem of the reduce function. By determining the pattern of the function and selecting corresponding test cases, the proposed test case generation strategy can achieve the same accuracy with a smaller test case set. It has been shown to be effective and has a high recall rate.

SCIENCE OF COMPUTER PROGRAMMING (2024)

Article Computer Science, Software Engineering

An industrial experience report on model-based, AI-enabled proposal development for an RFP/RFI

Padmalata Nistala, Asha Rajbhoj, Vinay Kulkarni, Sapphire Noronha, Ankit Joshi

Summary: This paper presents an automated proposal development approach using a combination of model-based and AI-enabled techniques, and discusses the successful deployment and user feedback of the system.

SCIENCE OF COMPUTER PROGRAMMING (2024)

Article Computer Science, Software Engineering

Translation certification for smart contracts

Jacco O. G. Krijnen, Manuel M. T. Chakravarty, Gabriele Keller, Wouter Swierstra

Summary: Compiler correctness is a long-standing problem, and it becomes more significant with the rise of smart contracts on blockchains. A translation certification framework can address the trust issue for low-level code on the blockchain, allowing users to have confidence in the compilation process of smart contracts.

SCIENCE OF COMPUTER PROGRAMMING (2024)

Article Computer Science, Software Engineering

OnTrack: Reflecting on domain specific formal methods for railway designs

Phillip James, Faron Moller, Filippos Pantekis

Summary: OnTrack is a tool that supports railway verification workflows using model driven engineering frameworks, allowing railway engineers to interact with verification procedures through encapsulating formal methods.

SCIENCE OF COMPUTER PROGRAMMING (2024)

Article Computer Science, Software Engineering

Generating C: Heterogeneous metaprogramming system description

Oleg Kiselyov

Summary: Heterogeneous metaprogramming systems leverage higher-level host languages to generate lower-level object language code, enabling faster production of high-performant code with correctness guarantees. This paper presents two systems with OCaml as the host language and C as the object language, discussing their implementation and applications.

SCIENCE OF COMPUTER PROGRAMMING (2024)

Article Computer Science, Software Engineering

Reasoning about logical systems in the Coq proof assistant

Conor Reynolds, Rosemary Monahan

Summary: This paper provides a detailed approach to formalize a fragment of the theory of institutions in the Coq proof assistant. The approach is illustrated and evaluated by instantiating the framework with specific institution examples.

SCIENCE OF COMPUTER PROGRAMMING (2024)

Article Computer Science, Software Engineering

Stochastic formal model of PI3K/mTOR pathway in Alzheimer's disease for drug repurposing: An evaluation of rapamycin, LY294002, and NVP-BEZ235

Herbert Rausch Fernandes, Giovanni Freitas Gomes, Antonio Carlos Pinheiro de Oliveira, Sergio Vale Aguiar Campos

Summary: Alzheimer's disease is a common form of dementia with no effective drug treatment available. In this study, a statistical model checking approach was used to analyze protein and drug interactions and evaluate the effects of different drugs on the components contributing to Alzheimer's disease. The results showed that rapamycin could slow down the biological process causing neuronal death, while LY294002 and NVP-BEZ235 may increase tau phosphorylation. These findings provide important insights for the scientific community and raise awareness about potential side effects of PI3K inhibitor drugs.

SCIENCE OF COMPUTER PROGRAMMING (2024)

Article Computer Science, Software Engineering

Denotational and operational semantics for interaction languages: Application to trace analysis

Erwan Mahe, Christophe Gaston, Pascale Le Gall

Summary: This paper presents an Interaction Language to encode Sequence Diagrams (SD) and associates it with three different formal semantics. This allows for direct formal verification of SD, while preserving traceability of SD concepts and executed actions, and addressing the translation of problematic operators.

SCIENCE OF COMPUTER PROGRAMMING (2024)

Article Computer Science, Software Engineering

DescribeML: A dataset description tool for machine learning

Joan Giner-Miguelez, Abel Gomez, Jordi Cabot

Summary: Datasets are crucial for training and evaluating machine learning models, but they can also lead to undesirable behaviors like biased predictions. To tackle this issue, the machine learning community suggests adopting consistent guidelines for dataset descriptions. However, these guidelines rely on natural language descriptions, which hinder automated computation and analysis. To overcome this, we present DescribeML, a language engineering tool that provides precise, structured descriptions of machine learning datasets, including their composition, provenance, and social concerns.

SCIENCE OF COMPUTER PROGRAMMING (2024)

Article Computer Science, Software Engineering

An iterative approach for model-based requirements engineering in large collaborative projects: A detailed experience report

Andrey Sadovykh, Bilal Said, Dragos Truscan, Hugo Bruneliere

Summary: In this paper, the authors report on their 7 years of practical experience with an iterative Model-based Requirements Engineering (MBRE) approach and language in five large European collaborative projects. They demonstrate through significant data sets that this model-based approach provides interesting benefits in terms of scalability, heterogeneity, adaptability, traceability, automation, consistency and quality, and usefulness or usability. Concrete examples from these projects are provided to illustrate the application of the MBRE approach and language, and the authors discuss the general benefits and limitations of using such an approach, as well as the lessons learned over the years.

SCIENCE OF COMPUTER PROGRAMMING (2024)

Article Computer Science, Software Engineering

Exploring complex models with picto web

Alfa Yohannis, Dimitris Kolovos, Antonio Garcia-Dominguez

Summary: Picto Web is a multi-tenant web-based tool that allows exploration of complex models by transforming them into various transient web-based views using rule-based transformations. It uses a lazy view computation approach to efficiently support large models and complex transformations, and includes monitoring and push notification facilities for automatic recomputation of views and updated delivery to clients.

SCIENCE OF COMPUTER PROGRAMMING (2024)

Article Computer Science, Software Engineering

GaMoVR: Gamification-based UML learning environment in virtual reality

Enes Yigitbas, Maximilian Schmidt, Antonio Bucchiarone, Sebastian Gottschalk, Gregor Engels

Summary: UML has become a popular modeling language used in computer science courses, and various interactive learning applications have been developed to improve student engagement and learning outcomes. However, these applications have not successfully created immersive environments for students. Therefore, this study introduces GaMoVR, a VR-based and gamified learning environment, which provides an interactive and fun learning experience for students learning about UML modeling.

SCIENCE OF COMPUTER PROGRAMMING (2024)

Article Computer Science, Software Engineering

How accessibility affects other quality attributes of software? A case study of GitHub

Yaxin Zhao, Lina Gong, Wenhua Yang, Yu Zhou

Summary: Accessible design aims to enable as many people as possible to access software products and services. This study investigates the interaction between accessibility issues and other factors affecting software performance. By analyzing a large number of accessibility issues, the study reveals the characteristics of these issues and their relationship with software quality attributes.

SCIENCE OF COMPUTER PROGRAMMING (2024)