4.2 Article

What is an optimal value of k in k-fold cross-validation in discrete Bayesian network analysis?

期刊

COMPUTATIONAL STATISTICS
卷 36, 期 3, 页码 2009-2031

出版社

SPRINGER HEIDELBERG
DOI: 10.1007/s00180-020-00999-9

关键词

Model validation; Classification error; randomized subsets; sample size

资金

  1. U.S. Forest Service, Pacific Northwest Research Station
  2. University of Melbourne, Australia

向作者/读者索取更多资源

This study examines the impact of different values of k and sample sizes on the validation results of Bayesian network models, finding that classification error decreases with increasing sample size and k value, with k = 10 generally yielding the best results.
Cross-validation using randomized subsets of data-known as k-fold cross-validation-is a powerful means of testing the success rate of models used for classification. However, few if any studies have explored how values of k (number of subsets) affect validation results in models tested with data of known statistical properties. Here, we explore conditions of sample size, model structure, and variable dependence affecting validation outcomes in discrete Bayesian networks (BNs). We created 6 variants of a BN model with known properties of variance and collinearity, along with data sets of n = 50, 500, and 5000 samples, and then tested classification success and evaluated CPU computation time with seven levels of folds (k = 2, 5, 10, 20, n - 5, n - 2, and n - 1). Classification error declined with increasing n, particularly in BN models with high multivariate dependence, and declined with increasing k, generally levelling out at k = 10, although k = 5 sufficed with large samples (n = 5000). Our work supports the common use of k = 10 in the literature, although in some cases k = 5 would suffice with BN models having independent variable structures.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.2
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Engineering, Industrial

Calibrating experts' probabilistic assessments for improved probabilistic predictions

A. M. Hanea, G. F. Nane

SAFETY SCIENCE (2019)

Article Ecology

Weighting and aggregating expert ecological judgments

Victoria Hemming, Anca M. Hanea, Terry Walshe, Mark A. Burgman

ECOLOGICAL APPLICATIONS (2020)

Article Engineering, Multidisciplinary

Improving expert forecasts in reliability: Application and evidence for structured elicitation protocols

Victoria Hemming, Nicholas Armstrong, Mark A. Burgman, Anca M. Hanea

QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL (2020)

Article Public, Environmental & Occupational Health

Uncertainty Quantification with Experts: Present Status and Research Needs

Anca M. Hanea, Victoria Hemming, Gabriela F. Nane

Summary: Expert elicitation is used when data is lacking and important decisions need to be made. When designing expert elicitation, practitioners aim to balance best practices with practical constraints. The choices made impact time and effort investment, data quality, expert engagement, result defensibility, and decision acceptability.

RISK ANALYSIS (2022)

Article Public, Environmental & Occupational Health

What is a Good Calibration Question?

Victoria Hemming, Anca M. Hanea, Mark A. Burgman

Summary: The study suggests that weighted aggregation outperforms equal weights on the combined CM score, but not on statistical accuracy. Experts were unable to adapt their knowledge across different domains, and in-sample validation on irrelevant questions did not accurately predict out-of-sample performance.

RISK ANALYSIS (2022)

Article Biodiversity Conservation

Predicting species and community responses to global change using structured expert judgement: An Australian mountain ecosystems case study

James S. Camac, Kate D. L. Umbers, John W. Morgan, Sonya R. Geange, Anca Hanea, Rachel A. Slatyer, Keith L. McDougall, Susanna E. Venn, Peter A. Vesk, Ary A. Hoffmann, Adrienne B. Nicotra

Summary: Conservation managers are facing challenges in making decisions to protect biodiversity in the Australian Alps due to climate change impacts. Expert predictions suggest that by 2050, most alpine vegetation communities will decrease in extent, while woodlands and heathlands are expected to increase. The responses of alpine plants vary greatly, while animal species are predicted to decline or remain stable.

GLOBAL CHANGE BIOLOGY (2021)

Article Multidisciplinary Sciences

Mathematically aggregating experts' predictions of possible futures

A. M. Hanea, D. P. Wilkinson, M. McBride, A. Lyon, D. van Ravenzwaaij, F. Singleton Thorn, C. Gray, D. R. Mandel, A. Willcox, E. Gould, E. T. Smith, F. Mody, M. Bush, F. Fidler, H. Fraser, B. C. Wintle

Summary: Structured protocols provide a transparent and systematic way to aggregate probabilistic predictions from multiple experts. By using mathematical rules for aggregation, the objectivity and quality of predictions can be enhanced and measured through accuracy, calibration, and informativeness. Performance-based weighted aggregation can be effective when experts' performance can be scored beforehand, while other aggregation methods informed by measurable proxies for good performance can also be considered.

PLOS ONE (2021)

Article Public, Environmental & Occupational Health

Balancing the Elicitation Burden and the Richness of Expert Input When Quantifying Discrete Bayesian Networks

Martine J. Barons, Steven Mascaro, Anca M. Hanea

Summary: SEJ is a structured method for obtaining estimates from groups of experts, aiming to minimize cognitive frailties. When the number of quantities required is large, imputation methods can be used for unelicited quantities. InterBeta is effective in interpolating conditional probability tables to reduce expert burden.

RISK ANALYSIS (2022)

Article Public, Environmental & Occupational Health

Co-designing and building an expert-elicited non-parametric Bayesian network model: demonstrating a methodology using a Bonamia Ostreae spread risk case study

Anca M. Hanea, Zoe Hilton, Ben Knight, Andrew P. Robinson

Summary: This article introduces the development and use of probabilistic models, particularly Bayesian networks (BN), for supporting risk-based decision making. It also highlights the promise of codesign and nonparametric Bayesian networks (NPBNs) in achieving a balance between model complexity and ease of development. A case study on the local spread of a marine pathogen is presented to demonstrate the process of codesigning, building, quantifying, and validating an NPBN model using structured expert judgment (SEJ).

RISK ANALYSIS (2022)

Editorial Material Biology

Reimagining peer review as an expert elicitation process

Alexandru Marcoci, Ans Vercammen, Martin Bush, Daniel G. Hamilton, Anca Hanea, Victoria Hemming, Bonnie C. Wintle, Mark Burgman, Fiona Fidler

Summary: Journal peer review plays an important role in regulating the flow of ideas in academic disciplines. However, research shows that editors cannot accurately identify the best experts for peer review. To prevent biases and uneven power distributions, introducing greater transparency and structure into the process is crucial.

BMC RESEARCH NOTES (2022)

Editorial Material Public, Environmental & Occupational Health

Bayesian networks for risk analysis and decision support

Anca M. Hanea, Annemarie Christophersen, Sandra Alday

RISK ANALYSIS (2022)

Article Mathematics, Interdisciplinary Applications

Improving the Computation of Brier Scores for Evaluating Expert-Elicited Judgements

Gayan Dharmarathne, Anca Hanea, Andrew P. Robinson

Summary: Structured expert judgment (SEJ) is a suite of techniques used to elicit expert predictions in situations where data are too expensive or impossible to obtain. The quality of expert predictions can be assessed using Brier scores and calibration questions. Research recommends using mixed-effects models to improve expert Brier scores and related operations.

FRONTIERS IN APPLIED MATHEMATICS AND STATISTICS (2021)

Review Public, Environmental & Occupational Health

Levee System Reliability Modeling: The Length Effect and Bayesian Updating

Kathryn Roscoe, Anca Hanea, Ruben Jongejan, Ton Vrouwenvelder

SAFETY (2020)

Article Geosciences, Multidisciplinary

Bayesian Network Modeling and Expert Elicitation for Probabilistic Eruption Forecasting: Pilot Study for Whakaari/White Island, New Zealand

Annemarie Christophersen, Natalia Deligne, Anca M. Hanea, Lauriane Chardot, Nicolas Fournier, Willy P. Aspinall

FRONTIERS IN EARTH SCIENCE (2018)

暂无数据