☆ 4.4 Article

An investigation on the feasibility of cross-project defect prediction

AUTOMATED SOFTWARE ENGINEERING (2012)

Journal

AUTOMATED SOFTWARE ENGINEERING

Volume 19, Issue 2, Pages 167-199

Publisher

SPRINGER

DOI: 10.1007/s10515-011-0090-3

Keywords

Defect prediction; Cross-project; Data characteristics; Machine learning; Training data

Funding

National Natural Science Foundation of China [60873072, 61073044, 60903050]
National Science and Technology Major Project
National Basic Research Program [2007CB310802]
CAS

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Software defect prediction helps to optimize testing resources allocation by identifying defect-prone modules prior to testing. Most existing models build their prediction capability based on a set of historical data, presumably from the same or similar project settings as those under prediction. However, such historical data is not always available in practice. One potential way of predicting defects in projects without historical data is to learn predictors from data of other projects. This paper investigates defect predictions in the cross-project context focusing on the selection of training data. We conduct three large-scale experiments on 34 data sets obtained from 10 open source projects. Major conclusions from our experiments include: (1) in the best cases, training data from other projects can provide better prediction results than training data from the same project; (2) the prediction results obtained using training data from other projects meet our criteria for acceptance on the average level, defects in 18 out of 34 cases were predicted at a Recall greater than 70% and a Precision greater than 50%; (3) results of cross-project defect predictions are related with the distributional characteristics of data sets which are valuable for training data selection. We further propose an approach to automatically select suitable training data for projects without historical data. Prediction results provided by the training data selected by using our approach are comparable with those provided by training data from the same project.

An investigation on the feasibility of cross-project defect prediction

Journal

AUTOMATED SOFTWARE ENGINEERING

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

An investigation on the feasibility of cross-project defect prediction

Journal

AUTOMATED SOFTWARE ENGINEERING

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper