Article
Radiology, Nuclear Medicine & Medical Imaging
Ravi K. Samala, Heang-Ping Chan, Lubomir Hadjiiski, Mark A. Helvie
Summary: This study examines the risk of feature leakage and its dependence on sample size when using pretrained deep convolutional neural network (DCNN) for breast mass classification. The simulation study and analysis on training and independent test sets reveal that feature leakage can lead to large generalization errors, emphasizing the importance of evaluation on unseen test cases for realistic performance assessment in clinical implementation.
Article
Health Care Sciences & Services
Menelaos Pavlou, Chen Qu, Rumana Z. Omar, Shaun R. Seaman, Ewout W. Steyerberg, Ian R. White, Gareth Ambler
Summary: This paper investigates the sample size requirements for validation studies with binary outcomes to estimate measures of predictive performance, providing various estimators which perform well even when normality assumptions are violated. Our estimators show good performance, even when normality assumptions are violated.
STATISTICAL METHODS IN MEDICAL RESEARCH
(2021)
Article
Psychology, Multidisciplinary
Sedat Sen, Allan S. Cohen
Summary: The study investigates the effects of sample size, test length, number of attributes, and base rate of mastery on item parameter recovery and classification accuracy of four DCMs. Results show that larger sample size and longer test length lead to more precise estimates of item parameters, but the recovery decreases as the number of attributes increases. The DINA and DINO models demonstrate higher item parameter recovery and classification accuracy.
FRONTIERS IN PSYCHOLOGY
(2021)
Article
Biology
Derek Dinart, Carine Bellera, Virginie Rondeau
Summary: A key issue in clinical trial design is accurately estimating the number of subjects needed, especially in multicenter or biomarker-stratified designs where the treatment effect size may vary. Limited research exists on determining sample size for such trials, highlighting the importance of considering baseline hazards and treatment effects heterogeneity to avoid bias in sample size estimates. Many current methods only account for one type of heterogeneity, lacking the ability to simultaneously address both sources of variation.
Article
Mathematical & Computational Biology
Haiyan Zheng, Michael J. Grayling, Pavel Mozgunov, Thomas Jaki, James M. S. Wason
Summary: Basket trials are increasingly used for evaluating new treatments in different patient subgroups. This paper proposes a Bayesian approach to determine sample size in basket trials, allowing information borrowing between similar subsets. The proposed approach yields comparable sample sizes for circumstances of no borrowing, and significantly reduces sample size when borrowing is enabled between commensurate subtrials. Examples and simulation studies demonstrate the feasibility and effectiveness of the proposed methodology.
Review
Medicine, General & Internal
Xinlian Zhang, Phillipp Hartmann
Summary: The calculation of required sample size is crucial in designing both animal and human studies. This review defines key terms related to sample size determination, such as mean, standard deviation, statistical hypothesis testing, type I/II error, power, direction of effect, effect size, expected attrition, corrected sample size, and allocation ratio. It also provides practical examples of sample size calculations based on pilot studies, similar larger studies, or estimated effect sizes per Cohen and Sawilowsky if no previous studies are available.
FRONTIERS IN MEDICINE
(2023)
Article
Surgery
David Chadow, N. Bryce Robinson, Gianmarco Cancelli, Giovanni Soletti, Katia Audisio, Mohamed Rahouma, Roberto Perezgrovas, Mario Gaudino
Summary: It is estimated that 25-30% of randomized controlled trials fail to reach their target sample size. Factors such as multicentre design, publication year, and commercial sponsor are inversely associated with failure to reach the target sample size. A substantial proportion of surgical trials fail to reach the target sample size, but there is an improving trend.
BRITISH JOURNAL OF SURGERY
(2022)
Article
Mathematical & Computational Biology
Lucinda Archer, Kym I. E. Snell, Joie Ensor, Mohammed T. Hudda, Gary S. Collins, Richard D. Riley
Summary: Clinical prediction models offer personalized outcome predictions for patient counseling and decision making, with external validation crucial for assessing model performance. Proposed criteria aim to determine minimum sample size needed for external validation of a clinical prediction model, considering factors like proportion of variance explained and agreement between predicted and observed values. The recommendations provide a framework for estimating precision and ensuring adequate sample sizes in future validation studies.
STATISTICS IN MEDICINE
(2021)
Article
Computer Science, Artificial Intelligence
Addisson Salazar, Luis Vergara, Enrique Vidal
Summary: In this paper, a theoretical learning curve is derived for the multi-class Bayes classifier, fitting general multivariate parametric models, and providing an estimate of the reduction in error probability with increased training set size. It does not depend on model parameters but relies on the training set size and feature vector dimension. This curve is useful in determining appropriate training set sizes in practice.
PATTERN RECOGNITION
(2023)
Article
Health Care Sciences & Services
Kym I. E. Snell, Lucinda Archer, Joie Ensor, Laura J. Bonnett, Thomas P. A. Debray, Bob Phillips, Gary S. Collins, Richard D. Riley
Summary: Rules-of-thumb for sample size in external validation of clinical prediction models may not be precise, with factors like LP distribution affecting precision of performance estimates. A tailored simulation-based approach can offer more flexibility and reliability in determining sample size requirements for validation.
JOURNAL OF CLINICAL EPIDEMIOLOGY
(2021)
Review
Geriatrics & Gerontology
Graziella D'Arrigo, Stefanos Roumeliotis, Claudia Torino, Giovanni Tripepi
Summary: A crucial step in planning a randomized clinical trial (RCT) is the calculation of sample size, which determines the optimal number of patients needed to ensure the study has enough power to detect differences in specific endpoints between study arms. This calculation involves inputting variables such as the expected effect size, alpha error (α), beta error (β), and the allocation ratio in order to determine the number of participants allocated to each arm of the RCT.
AGING CLINICAL AND EXPERIMENTAL RESEARCH
(2021)
Article
Medicine, General & Internal
Jacob Levman, Bryan Ewenson, Joe Apaloo, Derek Berger, Pascal N. N. Tyrrell
Summary: Supervised machine learning classification is widely used in industry and research. The article introduces an enhanced technique for hold-out validation, which assesses the consistency of mistakes made by the learning algorithm. This technique can improve the evaluation and design of reliable and predictable AI models.
Article
Computer Science, Artificial Intelligence
Zhiwang Zhang, Jing He, Jie Cao, Shuqing Li
Summary: Compared with easy feature creation or generation in data analysis, manual data labeling requires significant time and effort in most cases. Despite the potential improvement provided by automated data labeling, manual checking and verification is still necessary. Data mining and machine learning often encounter High Dimension and Low Sample Size (HDLSS) data, where traditional classifiers struggle due to data piling and approximate equidistance. This paper proposes a Maximum Decentral Projection Margin Classifier (MDPMC) within the framework of a Support Vector Classifier (SVC), effectively addressing issues related to data piling and approximate equidistance, as demonstrated by experimental results on real HDLSS datasets.
Article
Mathematical & Computational Biology
Richard D. Riley, Thomas P. A. Debray, Gary S. Collins, Lucinda Archer, Joie Ensor, Maarten van Smeden, Kym I. E. Snell
Summary: External validation is crucial in examining the performance of prediction models, but current studies often face issues with small sample sizes. To address this, determining the minimum sample size needed for a new external validation study with precise estimation calculations is proposed, taking into account calibration, discrimination, and clinical utility measures.
STATISTICS IN MEDICINE
(2021)
Article
Mathematical & Computational Biology
Xin Li, Wei Ma, Feifang Hu
Summary: Combining Covariate-adaptive randomization (CAR) with sample size re-estimation (SSR) in clinical trials has become increasingly popular due to its advantages in statistical efficiency and cost reduction. However, adjustments are necessary to protect the accuracy of the combined design, and this article provides a framework for the application of SSR in CAR trials and studies the underlying theoretical properties. Numerical studies show that the advantages of CAR and SSR can be further improved in terms of power and sample size.
STATISTICS IN MEDICINE
(2021)
Article
Engineering, Industrial
A. M. Hanea, G. F. Nane
Article
Ecology
Victoria Hemming, Anca M. Hanea, Terry Walshe, Mark A. Burgman
ECOLOGICAL APPLICATIONS
(2020)
Article
Engineering, Multidisciplinary
Victoria Hemming, Nicholas Armstrong, Mark A. Burgman, Anca M. Hanea
QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL
(2020)
Article
Public, Environmental & Occupational Health
Anca M. Hanea, Victoria Hemming, Gabriela F. Nane
Summary: Expert elicitation is used when data is lacking and important decisions need to be made. When designing expert elicitation, practitioners aim to balance best practices with practical constraints. The choices made impact time and effort investment, data quality, expert engagement, result defensibility, and decision acceptability.
Article
Public, Environmental & Occupational Health
Victoria Hemming, Anca M. Hanea, Mark A. Burgman
Summary: The study suggests that weighted aggregation outperforms equal weights on the combined CM score, but not on statistical accuracy. Experts were unable to adapt their knowledge across different domains, and in-sample validation on irrelevant questions did not accurately predict out-of-sample performance.
Article
Biodiversity Conservation
James S. Camac, Kate D. L. Umbers, John W. Morgan, Sonya R. Geange, Anca Hanea, Rachel A. Slatyer, Keith L. McDougall, Susanna E. Venn, Peter A. Vesk, Ary A. Hoffmann, Adrienne B. Nicotra
Summary: Conservation managers are facing challenges in making decisions to protect biodiversity in the Australian Alps due to climate change impacts. Expert predictions suggest that by 2050, most alpine vegetation communities will decrease in extent, while woodlands and heathlands are expected to increase. The responses of alpine plants vary greatly, while animal species are predicted to decline or remain stable.
GLOBAL CHANGE BIOLOGY
(2021)
Article
Multidisciplinary Sciences
A. M. Hanea, D. P. Wilkinson, M. McBride, A. Lyon, D. van Ravenzwaaij, F. Singleton Thorn, C. Gray, D. R. Mandel, A. Willcox, E. Gould, E. T. Smith, F. Mody, M. Bush, F. Fidler, H. Fraser, B. C. Wintle
Summary: Structured protocols provide a transparent and systematic way to aggregate probabilistic predictions from multiple experts. By using mathematical rules for aggregation, the objectivity and quality of predictions can be enhanced and measured through accuracy, calibration, and informativeness. Performance-based weighted aggregation can be effective when experts' performance can be scored beforehand, while other aggregation methods informed by measurable proxies for good performance can also be considered.
Article
Public, Environmental & Occupational Health
Martine J. Barons, Steven Mascaro, Anca M. Hanea
Summary: SEJ is a structured method for obtaining estimates from groups of experts, aiming to minimize cognitive frailties. When the number of quantities required is large, imputation methods can be used for unelicited quantities. InterBeta is effective in interpolating conditional probability tables to reduce expert burden.
Article
Public, Environmental & Occupational Health
Anca M. Hanea, Zoe Hilton, Ben Knight, Andrew P. Robinson
Summary: This article introduces the development and use of probabilistic models, particularly Bayesian networks (BN), for supporting risk-based decision making. It also highlights the promise of codesign and nonparametric Bayesian networks (NPBNs) in achieving a balance between model complexity and ease of development. A case study on the local spread of a marine pathogen is presented to demonstrate the process of codesigning, building, quantifying, and validating an NPBN model using structured expert judgment (SEJ).
Editorial Material
Biology
Alexandru Marcoci, Ans Vercammen, Martin Bush, Daniel G. Hamilton, Anca Hanea, Victoria Hemming, Bonnie C. Wintle, Mark Burgman, Fiona Fidler
Summary: Journal peer review plays an important role in regulating the flow of ideas in academic disciplines. However, research shows that editors cannot accurately identify the best experts for peer review. To prevent biases and uneven power distributions, introducing greater transparency and structure into the process is crucial.
BMC RESEARCH NOTES
(2022)
Editorial Material
Public, Environmental & Occupational Health
Anca M. Hanea, Annemarie Christophersen, Sandra Alday
Article
Mathematics, Interdisciplinary Applications
Gayan Dharmarathne, Anca Hanea, Andrew P. Robinson
Summary: Structured expert judgment (SEJ) is a suite of techniques used to elicit expert predictions in situations where data are too expensive or impossible to obtain. The quality of expert predictions can be assessed using Brier scores and calibration questions. Research recommends using mixed-effects models to improve expert Brier scores and related operations.
FRONTIERS IN APPLIED MATHEMATICS AND STATISTICS
(2021)
Review
Public, Environmental & Occupational Health
Kathryn Roscoe, Anca Hanea, Ruben Jongejan, Ton Vrouwenvelder
Article
Geosciences, Multidisciplinary
Annemarie Christophersen, Natalia Deligne, Anca M. Hanea, Lauriane Chardot, Nicolas Fournier, Willy P. Aspinall
FRONTIERS IN EARTH SCIENCE
(2018)