☆ 4.1 Article

Examining the performance of kernel methods for software defect prediction based on support vector machine

SCIENCE OF COMPUTER PROGRAMMING (2023)

期刊

SCIENCE OF COMPUTER PROGRAMMING

卷 226, 期 -, 页码 -

出版社

ELSEVIER

DOI: 10.1016/j.scico.2022.102916

关键词

Software defect prediction; Kernel functions; Support vector machine; Information gain

类别

Computer Science, Software Engineering

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This study examines the impact and stability of four kernel functions (including linear and non-linear) with feature selection on the performance of SVM for software defect prediction. The findings show that only RBF outperforms linear kernel function and is more effective for datasets with high imbalance ratios. Different feature subsets affect the performances of all kernel functions, but using the top 40% of features yields the best results. Therefore, it is recommended to use SVM with RBF kernel for defect datasets.

Support Vector Machine (SVM) has been widely used to build software defect prediction models. Prior studies compared the accuracy of SVM to other machine learning algorithms but arrives at contradictory conclusions due to the use of different choices of kernel functions and metrics. Such a contradictory conclusion raises an important question about the performance of kernel functions, across different experimental conditions. To this end, the present study examines the impact and stability of four kernel functions with feature selection on the performance of SVM for software defect prediction. Strictly speaking, we examine the performance of nonlinear kernel functions against linear kernel function based on different experimental parameters such as data granularity, imbalance ratio of the dataset, and feature subsets. A large-scale study has been conducted using four kernel functions, ten feature subset selection thresholds based on the Information gain algorithm, 38 public datasets and one evaluation measure. This has resulted in 1520 experiments. The findings demonstrate that: 1) Not all nonlinear kernel functions significantly outperform linear, only RBF surpasses linear and other nonlinear kernel functions. 2) We don't have significant difference between kernel functions w.r.t. data granularity, we only found significant difference between RBF and other kernel function based on 'function' data granularity. 3) we also found that RBF can work significantly better than linear and other nonlinear function over datasets with very high and high imbalance ratios. 4) The performances of all kernel functions fluctuate over different feature subsets; However, using top 40% of the features would work best with all kernel functions. To conclude, we can recommend using SVM with RBF kernel for defects datasets because the performance of other kernel functions is limited.(c) 2022 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

主要评分

4.1

评分不足

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

Software defect prediction model based on improved twin support vector machines

Jianming Liu, Jie Lei, Zhouyu Liao, Jiali He

Summary: This study proposes a novel software defect prediction model based on a twin support vector machine to address the issue of imbalanced data classification, achieving higher accuracy and robustness in classifying imbalanced data.

SOFT COMPUTING (2023)