☆ 4.4 Article

On the assessment of software defect prediction models via ROC curves

EMPIRICAL SOFTWARE ENGINEERING (2020)

Journal

EMPIRICAL SOFTWARE ENGINEERING

Volume 25, Issue 5, Pages 3977-4019

Publisher

SPRINGER

DOI: 10.1007/s10664-020-09861-4

Keywords

Software defect prediction model; Software defect proneness; ROC; Thresholds; AUC; Gini

Funding

Universita degli Studi dell'Insubria within the CRUI-CARE Agreement

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Software defect prediction models are classifiers often built by setting a thresholdton a defect proneness model, i.e., a scoring function. For instance, they classify a software module non-faulty if its defect proneness is belowtand positive otherwise. Different values oftmay lead to different defect prediction models, possibly with very different performance levels. Receiver Operating Characteristic (ROC) curves provide an overall assessment of a defect proneness model, by taking into account all possible values oftand thus all defect prediction models that can be built based on it. However, using a defect proneness model with a value oftis sensible only if the resulting defect prediction model has a performance that is at least as good as some minimal performance level that depends on practitioners' and researchers' goals and needs. We introduce a new approach and a new performance metric (the Ratio of Relevant Areas) for assessing a defect proneness model by taking into account only the parts of a ROC curve corresponding to values oftfor which defect proneness models have higher performance than some reference value. We provide the practical motivations and theoretical underpinnings for our approach, by: 1) showing how it addresses the shortcomings of existing performance metrics like the Area Under the Curve and Gini's coefficient; 2) deriving reference values based on random defect prediction policies, in addition to deterministic ones; 3) showing how the approach works with several performance metrics (e.g.,PrecisionandRecall) and their combinations; 4) studying misclassification costs and providing a general upper bound for the cost related to the use of any defect proneness model; 5) showing the relationships between misclassification costs and performance metrics. We also carried out a comprehensive empirical study on real-life data from the SEACRAFT repository, to show the differences between our metric and the existing ones and how more reliable and less misleading our metric can be.

On the assessment of software defect prediction models via ROC curves

Journal

EMPIRICAL SOFTWARE ENGINEERING

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

On the assessment of software defect prediction models via ROC curves

Journal

EMPIRICAL SOFTWARE ENGINEERING

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper