4.6 Article

Can We Train Machine Learning Methods to Outperform the High-dimensional Propensity Score Algorithm?

期刊

EPIDEMIOLOGY
卷 29, 期 2, 页码 191-198

出版社

LIPPINCOTT WILLIAMS & WILKINS
DOI: 10.1097/EDE.0000000000000787

关键词

-

资金

  1. Canadian Network for Observational Drug Effect Studies (CNODES)
  2. Canadian Institutes of Health Research (CIHR)
  3. FQR-S
  4. Fonds de Recherche du Quebec - Sante (FQR-S)

向作者/读者索取更多资源

The use of retrospective health care claims datasets is frequently criticized for the lack of complete information on potential confounders. Utilizing patient's health status-related information from claims datasets as surrogates or proxies for mismeasured and unobserved confounders, the high-dimensional propensity score algorithm enables us to reduce bias. Using a previously published cohort study of postmyocardial infarction statin use (1998-2012), we compare the performance of the algorithm with a number of popular machine learning approaches for confounder selection in high-dimensional covariate spaces: random forest, least absolute shrinkage and selection operator, and elastic net. Our results suggest that, when the data analysis is done with epidemiologic principles in mind, machine learning methods perform as well as the high-dimensional propensity score algorithm. Using a plasmode framework that mimicked the empirical data, we also showed that a hybrid of machine learning and high-dimensional propensity score algorithms generally perform slightly better than both in terms of mean squared error, when a bias-based analysis is used.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据