4.6 Article

A simulation study of sample size demonstrated the importance of the number of events per variable to develop prediction models in clustered data

期刊

JOURNAL OF CLINICAL EPIDEMIOLOGY
卷 68, 期 12, 页码 1406-1414

出版社

ELSEVIER SCIENCE INC
DOI: 10.1016/j.jclinepi.2015.02.002

关键词

Clustered data; Multicenter study; Events per variable; Logistic model; Prediction model; Simulation study

资金

  1. Flanders' Agency for Innovation by Science and Technology (IWT Vlaanderen)
  2. Research Foundation-Flanders (FWO) [G049312 N]
  3. Netherlands Organization for Scientific Research [917.11.383, 9120.8004, 918.10.615]
  4. ZonMw [17088.25029]
  5. Research Council KUL [GOA/10/09, CoE PFV/10/002]
  6. Flemish Government [iMinds Medical Information Technologies SBO]
  7. Belgian Federal Science Policy Office [IUAP P7/19]

向作者/读者索取更多资源

Objectives: This study aims to investigate the influence of the amount of clustering [intraclass correlation (ICC) = 0%, 5%, or 20%], the number of events per variable (EPV) or candidate predictor (EPV = 5, 10, 20, or 50), and backward variable selection on the performance of prediction models. Study Design and Setting: Researchers frequently combine data from several centers to develop clinical prediction models. In our simulation study, we developed models from clustered training data using multilevel logistic regression and validated them in external data. Results: The amount of clustering was not meaningfully associated with the models' predictive performance. The median calibration slope of models built in samples with EPV = 5 and strong clustering (ICC = 20%) was 0.71. With EPV = 5 and ICC = 0%, it was 0.72. A higher EPV related to an increased performance: the calibration slope was 0.85 at EPV = 10 and ICC = 20% and 0.96 at EPV = 50 and ICC = 20%. Variable selection sometimes led to a substantial relative bias in the estimated predictor effects (up to 118% at EPV = 5), but this had little influence on the model's performance in our simulations. Conclusion: We recommend at least 10 EPV to fit prediction models in clustered data using logistic regression. Up to 50 EPV may be needed when variable selection is performed. (C) 2015 Elsevier Inc. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据