☆ 4.6 Article

Semi-Supervised Deep Fuzzy C-Mean Clustering lefor Software Fault Prediction

IEEE ACCESS (2018)

Journal

IEEE ACCESS

Volume 6, Issue -, Pages 25675-25685

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/ACCESS.2018.2835304

Keywords

Semi-supervised learning; fuzzy C-Mean clustering; feature learning; software fault prediction

Funding

National Basic Research Program (973 Program) of China [2013CB329402]
National Natural Science Foundation of China [61573267, 61473215, 61571342, 61572383, 61501353, 61502369, 61271302, 61272282, 61202176]
Fund for Foreign Scholars in University Research and Teaching Programs (111 Project) [B07048]
Major Research Plan of the National Natural Science Foundation of China [91438201, 91438103]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Software fault prediction is a consequential research area in software quality promise. In this paper, we propose a semi-supervised deep fuzzy C-mean (DFCM) clustering for software fault prediction, which is the cumulation of semi-supervised DFCM clustering and feature compression techniques. Deep is utilized for the feature-based multi clusters of unlabeled and labeled data sets along with their labeled classes. In our approach, for the training model, we simultaneously deal with the unsupervised data and supervised data to exploit the obnubilated information from unlabeled data to labeled data to support the construction of the precise model. We utilize DFCM clustering to handle the class imbalance problem and withal fuzzy theory logic is very akin to human logic and it is facile to comprehend. We further ameliorate the prediction performance with the coalescence of feature learning techniques-feature extraction and feature selection (using random-under sampling) to generate good features and remove irrelevant and redundant features to reduce the noisy data for classification. However, by the performance of the model results, the amalgamation of deep multi clusters and feature techniques work better due to their ability to identify and amalgamation essential information in data feature. The classification model is predicted on the maximum homogeneous between the features of labeled and unlabeled data, the model is trained on the un-noisy data set obtained by the deep coalescence of multi clusters and feature techniques. To check the efficacy of our approach, we chose data sets from real-world software project (NASA & Eclipse), and then we compared our approach with a number of latest classical base-line methods, and investigate the performance by using performance measures such as probability of detection, F-measure, and area under the curve.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6

Not enough ratings

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

A review on semi-supervised clustering

Jianghui Cai, Jing Hao, Haifeng Yang, Xujun Zhao, Yuqing Yang

Summary: Semi-supervised clustering (SSC) is a technique that integrates semi-supervised learning and clustering analysis to improve clustering performance by incorporating prior information. This paper provides a comprehensive review of SSC, organized into different categories and discusses their performance, suitable scenarios, and ways to add supervising information. It also summarizes successful applications of SSC in various fields and provides application caveats and development trends. This review and analysis of SSC can benefit researchers in providing an overall understanding, research topics, and analysis of existing methods.

INFORMATION SCIENCES (2023)