☆ 4.4 Article

Comparing the effectiveness of several modeling methods for fault prediction

EMPIRICAL SOFTWARE ENGINEERING (2010)

期刊

EMPIRICAL SOFTWARE ENGINEERING

卷 15, 期 3, 页码 277-295

出版社

SPRINGER

DOI: 10.1007/s10664-009-9111-2

关键词

Empirical study; Fault prediction; Negative binomial; Recursive partitioning; Random forests; Bayesian trees; Fault-percentile-average

类别

Computer Science, Software Engineering

向作者/读者索取更多资源

Protocol

Reagent

摘要

We compare the effectiveness of four modeling methods-negative binomial regression, recursive partitioning, random forests and Bayesian additive regression trees-for predicting the files likely to contain the most faults for 28 to 35 releases of three large industrial software systems. Predictor variables included lines of code, file age, faults in the previous release, changes in the previous two releases, and programming language. To compare the effectiveness of the different models, we use two metrics-the percent of faults contained in the top 20% of files identified by the model, and a new, more general metric, the fault-percentile-average. The negative binomial regression and random forests models performed significantly better than recursive partitioning and Bayesian additive regression trees, as assessed by either of the metrics. For each of the three systems, the negative binomial and random forests models identified 20% of the files in each release that contained an average of 76% to 94% of the faults.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4

评分不足

次要评分

新颖性

-

重要性

-

科学严谨性

-

评价这篇论文

推荐

Article Mathematics, Interdisciplinary Applications

Ordinal Trees and Random Forests: Score-Free Recursive Partitioning and Improved Ensembles

Gerhard Tutz

Summary: The study introduces an improved method for ordinal trees that avoid the artificial assignment of scores and adopts the construction principle of binary models, combining trees and parametric models for prediction. The potential performance issues of random forests are also discussed, with proposals for ensemble models to achieve better predictive performance.

JOURNAL OF CLASSIFICATION (2022)

添加到收藏夹

Article Biochemistry & Molecular Biology

Development of decision trees to discriminate HDAC8 inhibitors and non-inhibitors using recursive partitioning

Sk Abdul Amin, Nilanjan Adhikari, Tarun Jha

Summary: In this study, a diverse set of compounds were analyzed using recursive partitioning (RP) analysis to develop decision trees for discriminating HDAC8 inhibitors from non-inhibitors. Understanding essential structural and physicochemical parameters is crucial for designing potential and selective HDAC8 inhibitors, and the results validate previous findings from Bayesian modeling. This comparative learning will enhance drug discovery efforts related to HDAC8 inhibitors.

JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS (2021)

添加到收藏夹

Article Computer Science, Theory & Methods

An explicit split point procedure in model-based trees allowing for a quick fitting of GLM trees and GLM forests

Christophe Dutang, Quentin Guibert

Summary: This paper proposes a split point procedure based on explicit likelihood to speed up the search for the best split point in CART. Through simulation and benchmarking on empirical datasets, GLM trees are shown to have good performance in certain situations. The approach is extended to multiway split trees and log-transformed distributions. A numerical comparison of GLM forests against other random forest-type approaches is also provided.

STATISTICS AND COMPUTING (2022)

添加到收藏夹

Article Statistics & Probability

Hidden Markov Polya Trees for High-Dimensional Distributions

Naoki Awaya, Li Ma

Summary: The Polya tree (PT) process is a versatile Bayesian nonparametric model that has been widely used in inference problems. Recent developments have shown that the performance of PT models can be improved by adapting the partition tree to the underlying distributions and incorporating latent state variables. However, there are still important limitations, including sensitivity to the choice of the partition tree and lack of scalability with respect to dimensionality.

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION (2022)

添加到收藏夹

Article Ergonomics

Derivation of the Empirical Bayesian method for the Negative Binomial-Lindley generalized linear model with application in traffic safety

Ali Khodadadi, Ioannis Tsapakis, Mohammadali Shirazi, Subasish Das, Dominique Lord

Summary: This study proposed an Empirical Bayesian method based on the Negative Binomial-Lindley model for estimating the expected crash frequency. The results showed that this method can estimate the expected crashes with comparable precision to the Full Bayesian method, but with lower computational cost. It can be applied to other safety-related tasks.

ACCIDENT ANALYSIS AND PREVENTION (2022)

添加到收藏夹

Article Statistics & Probability

Adaptive Design and Analysis Via Partitioning Trees for Emulation of a Complex Computer Code

Sonja Isberg, William J. Welch

Summary: This study discusses multiple methods for addressing the computational complexity of computer models in large designs, and proposes a new method called "adaptive design and analysis via partitioning trees (ADAPT)". The proposed method partitions the input space in regions of high variability to obtain a higher density of points for accurate prediction.

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS (2022)

添加到收藏夹

Article Automation & Control Systems

An efficient fault-tolerant distributed Bayesian filter based on conservative fusion

Litao Zheng, Feng Yang, Lihong Shi

Summary: This paper proposes a fault-tolerant distributed Bayesian filter for multi-sensor state estimation using a peer-to-peer sensor network with incoherent local estimates problems. The proposed approach uses a Gaussian mixture to represent the fusion result, effectively reducing negative impact. The resulting filter performs Bayesian recursion via Gaussian mixture and utilizes a novel arithmetic average fusion for heterogeneous sensor networks.

ISA TRANSACTIONS (2023)

添加到收藏夹

Article Construction & Building Technology

Recursive partitioning and Gaussian Process Regression for the detection and localization of damages in pultruded Glass Fiber Reinforced Polymer material

Giosue Boscato, Marco Civera, Luca Zanotti Fragonara

Summary: The paper proposes a methodology for detecting and localizing damages in composite pultruded members, particularly focusing on thin-walled pultruded members. The method is applied to numerical and experimental data, analyzing both modal shapes and the influence of damage on the performance of Glass Fiber Reinforced Polymer (GFRP) members. The reliability of the proposed semiparametric statistical method is demonstrated through numerical investigation and comparison with experimental results on cracked beams and frame structures.

STRUCTURAL CONTROL & HEALTH MONITORING (2021)

添加到收藏夹

Article Ecology

Demographic partitioning of dynamic energy subsidies revealed with an Ornstein-Uhlenbeck space use model

Joseph M. Eisaguirre, Travis L. Booms, Christopher P. Barger, Stephen B. Lewis, Greg A. Breed

Summary: This study explores the differential habitat selection and space use between floaters and territorial golden eagles based on satellite telemetry data. The results reveal that floaters have more expansive space use patterns and larger home ranges compared to territorial eagles, and they partition space with territorial individuals through differential habitat and resource selection.

ECOLOGICAL APPLICATIONS (2022)

添加到收藏夹

Article Linguistics

Seeing the wood for the trees: predictive margins for random forests

Lukas Soenning, Jason Grafmiller

Summary: Classification trees and random forests are attractive methods for corpus data analysis. However, their typical reporting style lacks sufficient information on the relationship between predictors and outcomes. This paper introduces predictive margins as an interpretative approach to ensemble techniques like random forests, providing adjusted predictions and allowing for nonlinear associations and interactions. It outlines the general strategy and addresses methodological issues, using English genitive alternation data as an example and providing an R package for implementation.

CORPUS LINGUISTICS AND LINGUISTIC THEORY (2023)

添加到收藏夹

Article Statistics & Probability

Confidence intervals with maximal average power

Christian Bartels, Johanna Mielke, Ekkehard Glimm

Summary: This study proposes a frequentist testing procedure that allows adjusting the decision rules and increasing power by selecting a prior distribution. However, it comes with the risk of losing power if the data generating distribution or the observed data are incompatible with the prior distribution. The approach is illustrated using a simple binomial experiment and the potential beyond the example is discussed. It is worth noting that the testing procedure is constructed using Bayesian posterior probability distribution.

COMMUNICATIONS IN STATISTICS-THEORY AND METHODS (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Bayesian network for integrated circuit testing probe card fault diagnosis and troubleshooting to empower Industry 3.5 smart production and an empirical study

Wenhan Fu, Chen-Fu Chien, Lizhen Tang

Summary: Probe cards are essential test interfaces for integrated circuit testing, but the diagnosing and troubleshooting process can be complex and time-consuming. This study aims to develop a Bayesian network using data-driven solutions and potential rules derived from domain knowledge to enhance data integrity and improve troubleshooting efficiency.

JOURNAL OF INTELLIGENT MANUFACTURING (2022)

添加到收藏夹

Article Multidisciplinary Sciences

Estimation of common percentile of rainfall datasets in Thailand using delta-lognormal distributions

Warisa Thangjai, Sa-Aat Niwitpong, Suparat Niwitpong

Summary: Weighted percentiles are used to investigate the overall trend of rainfall in Thailand, and confidence intervals for common percentiles of delta-lognormal distributions are constructed. Comparisons of coverage probabilities and average lengths show that one Bayesian approach performs better than others.

PEERJ (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

MODELING OF DISCRETE QUESTIONNAIRE DATA WITH DIMENSION REDUCTION

S. Jozova, E. Uglickich, I. Nagy, R. Likhonina

Summary: The paper presents an algorithm for modeling discrete questionnaire data with reduced dimension. The algorithm reduces the dimension of the discrete model by constructing local models based on independent binomial mixtures estimated using recursive Bayesian algorithms and the naive Bayes technique. The algorithm allows for modeling high dimensional questionnaire data with a large number of explanatory variables and their possible realizations. The algorithm is applied to the analysis of traffic accident questionnaires, where it is used for classifying accident circumstances and predicting the severity of traffic accidents using current discrete data. The effectiveness of the obtained model is demonstrated through testing on real data and comparison with theoretical counterparts.

NEURAL NETWORK WORLD (2022)

添加到收藏夹

Article Computer Science, Theory & Methods

Efficient stochastic optimisation by unadjusted Langevin Monte Carlo Application to maximum marginal likelihood and empirical Bayesian estimation

Valentin De Bortoli, Alain Durmus, Marcelo Pereyra, Ana F. Vidal

Summary: This paper proposes a method using unadjusted Langevin algorithms to construct stochastic approximation, addressing the difficulties of using high-dimensional Markov chain Monte Carlo algorithms in large problems. This approach leads to a highly efficient stochastic optimization method with favorable convergence properties that can be quantified explicitly and easily checked.

STATISTICS AND COMPUTING (2021)

添加到收藏夹

暂无数据

暂无数据

© Peeref 2019-2024. All rights reserved.