4.6 Article

3D-MICE: integration of cross-sectional and longitudinal imputation for multi-analyte longitudinal clinical data

出版社

OXFORD UNIV PRESS
DOI: 10.1093/jamia/ocx133

关键词

machine learning; imputation; missing data; electronic health record; EHR; multiple imputation with chained equations; Gaussian process; computational pathology; data mining

资金

  1. Massachusetts Institute of Technology-Massachusetts General Hospital strategic partnership grant under the Grand Challenge I: Diagnostics

向作者/读者索取更多资源

Objective: A key challenge in clinical data mining is that most clinical datasets contain missing data. Since many commonly used machine learning algorithms require complete datasets (no missing data), clinical analytic approaches often entail an imputation procedure to fill in missing data. However, although most clinical datasets contain a temporal component, most commonly used imputation methods do not adequately accommodate longitudinal time-based data. We sought to develop a new imputation algorithm, 3-dimensional multiple imputation with chained equations (3D-MICE), that can perform accurate imputation of missing clinical time series data. Methods: We extracted clinical laboratory test results for 13 commonly measured analytes (clinical laboratory tests). We imputed missing test results for the 13 analytes using 3 imputation methods: multiple imputation with chained equations (MICE), Gaussian process (GP), and 3D-MICE. 3D-MICE utilizes both MICE and GP imputation to integrate cross-sectional and longitudinal information. To evaluate imputation method performance, we randomly masked selected test results and imputed these masked results alongside results missing from our original data. We compared predicted results to measured results for masked data points. Results: 3D-MICE performed significantly better than MICE and GP-based imputation in a composite of all 13 analytes, predicting missing results with a normalized root-mean-square error of 0.342, compared to 0.373 for MICE alone and 0.358 for GP alone. Conclusions: 3D-MICE offers a novel and practical approach to imputing clinical laboratory time series data. 3D-MICE may provide an additional tool for use as a foundation in clinical predictive analytics and intelligent clinical decision support.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Biochemical Research Methods

Evaluating the state of the art in missing data imputation for clinical data

Yuan Luo

Summary: Clinical data often have missing entries, posing a challenge to deriving optimal knowledge from the data. The Data Analytics Challenge on Missing data Imputation (DACMI) provides a benchmark dataset for evaluating and advancing imputation techniques for clinical time series. Competitive machine learning and statistical models coupled with carefully engineered features show strong performance in imputation.

BRIEFINGS IN BIOINFORMATICS (2022)

Review Public, Environmental & Occupational Health

Machine Learning in Causal Inference: Application in Pharmacovigilance

Yiqing Zhao, Yue Yu, Hanyin Wang, Yikuan Li, Yu Deng, Guoqian Jiang, Yuan Luo

Summary: This paper discusses the use of machine learning methods and causal inference models in pharmacovigilance. It points out that pharmacovigilance is lagging behind in integrating machine learning and causal inference models, and highlights current research directions and gaps.

DRUG SAFETY (2022)

Review Endocrinology & Metabolism

Ductal Carcinoma In Situ of Breast: From Molecular Etiology to Therapeutic Management

Shelby Lynn Hophan, Olena Odnokoz, Huiping Liu, Yuan Luo, Seema Khan, William Gradishar, Zhuan Zhou, Sunil Badve, Mylin A. Torres, Yong Wan

Summary: Ductal carcinoma in situ (DCIS) is a noninvasive breast cancer, but the molecular mechanisms behind its progression to invasive ductal carcinoma (IDC) and the complexity of each lesion are still unclear. Understanding the molecular features that lead to DCIS progression and finding new strategies to identify molecular mechanisms are crucial for more targeted therapy.

ENDOCRINOLOGY (2022)

Article Emergency Medicine

COVID-19 SEROPREVALENCE IN ED HEALTH CARE PROFESSIONALS STUDY: A CROSS-SECTIONAL STUDY

Brian J. Yun, Joshua J. Baugh, Sayon Dutta, David F. M. Brown, Elizabeth S. Temin, Sarah E. Turbett, Erica S. Shenoy, Paul D. Biddinger, Anand S. Dighe, Kyle Kays, Blair Alden Parry, Brenna McKaig, Caroline Beakes, Justin Margolin, Nicole Russell, Carl Lodenstein, Dustin S. McEvoy, Michael R. Filbin

Summary: This study assessed the seroprevalence of SARS-CoV-2 antibodies among ED health care professionals without confirmed history of COVID-19 infection at a quaternary academic medical center. The results showed a low seroprevalence of SARS-CoV-2 antibodies among the participants.

JOURNAL OF EMERGENCY NURSING (2022)

Article Medical Informatics

Comparison between machine learning methods for mortality prediction for sepsis patients with different social determinants

Hanyin Wang, Yikuan Li, Andrew Naidech, Yuan Luo

Summary: This study analyzed a cohort of critical care patients from the MIMIC-III database and revealed disparities in social determinants among patients identified by different sepsis criteria. The study also found that the performance of mortality prediction for sepsis patients can be compromised when using a universally trained model for each subpopulation.

BMC MEDICAL INFORMATICS AND DECISION MAKING (2022)

Correction Public, Environmental & Occupational Health

Machine Learning in Causal Inference: Application in Pharmacovigilance (vol 45, pg 459, 2022)

Yiqing Zhao, Yue Yu, Hanyin Wang, Yikuan Li, Yu Deng, Guoqian Jiang, Yuan Luo

DRUG SAFETY (2022)

Article Computer Science, Interdisciplinary Applications

SurvMaximin: Robust federated approach to transporting survival risk prediction models

Xuan Wang, Harrison G. Zhang, Xin Xiong, Chuan Hong, Griffin M. Weber, Gabriel A. Brat, Clara-Lea Bonzel, Yuan Luo, Rui Duan, Nathan P. Palmer, Meghan R. Hutch, Alba Gutierrez-Sacristan, Riccardo Bellazzi, Luca Chiovato, Kelly Cho, Arianna Dagliati, Hossein Estiri, Noelia Garcia-Barrio, Romain Griffier, David A. Hanauer, Yuk-Lam Ho, John H. Holmes, Mark S. Keller, Jeffrey G. Klann MEng, Sehi L'Yi, Sara Lozano-Zahonero, Sarah E. Maidlow, Adeline Makoudjou, Alberto Malovini, Bertrand Moal, Jason H. Moore, Michele Morris, Danielle L. Mowery, Shawn N. Murphy, Antoine Neuraz, Kee Yuan Ngiam, Gilbert S. Omenn, Lav P. Patel, Miguel Pedrera-Jimenez, Andrea Prunotto, Malarkodi Jebathilagam Samayamuthu, Fernando J. Sanz Vidorreta, Emily R. Schriver, Petra Schubert, Pablo Serrano-Balazote, Andrew M. South, Amelia L. M. Tan, Byorn W. L. Tan, Valentina Tibollo, Patric Tippmann, Shyam Visweswaran, Zongqi Xia, William Yuan, Daniela Zoller, Isaac S. Kohane, Paul Avillach, Zijian Guo, Tianxi Cai

Summary: This study proposes a method called SurvMaximin to estimate Cox model feature coefficients for a target population by borrowing summary information from other healthcare centers. The method achieves comparable or higher accuracy compared to existing methods and is robust to variations in sample sizes and estimated feature coefficients between centers.

JOURNAL OF BIOMEDICAL INFORMATICS (2022)

Article Biotechnology & Applied Microbiology

A comprehensive SARS-CoV-2-human protein-protein interactome reveals COVID-19 pathobiology and potential host therapeutic targets

Yadi Zhou, Yuan Liu, Shagun Gupta, Mauricio Paramo, Yuan Hou, Chengsheng Mao, Yuan Luo, Julius Judd, Shayne Wierbowski, Marta Bertolotti, Mriganka Nerkar, Lara Jehi, Nir Drayman, Vlad Nicolaescu, Haley Gula, Savas Tay, Glenn Randall, Peihui Wang, John T. Lis, Cedric Feschotte, Serpil C. Erzurum, Feixiong Cheng, Haiyuan Yu

Summary: Studying the interaction between viral and host proteins can help discover therapies for viral infections. In this study, a comprehensive network of interactions between SARS-CoV-2 and human proteins was generated using high-throughput techniques, validating known host factors and identifying new ones. The network showed the highest overlap with differentially expressed genes in COVID-19 patients and revealed an interaction between a viral protein and a human transcription factor. Additionally, network-based screening of FDA-approved or investigational drugs identified several candidates with significant proximity to SARS-CoV-2 host factors, including a drug called carvedilol which showed clinical benefits and antiviral properties.

NATURE BIOTECHNOLOGY (2022)

Article Pathology

Use of Clinical Decision Support to Improve the Laboratory Evaluation of Monoclonal Gammopathies

Daniel S. Pearson, Dustin S. McEvoy, Mandakolathur R. Murali, Anand S. Dighe

Summary: The use of a clinical decision support (CDS) alert can increase compliance with guidelines and improve the diagnostic evaluation of patients with suspected monoclonal gammopathies (MGs).

AMERICAN JOURNAL OF CLINICAL PATHOLOGY (2023)

Article Computer Science, Information Systems

Characterizing variability of electronic health record-driven phenotype definitions

Pascal S. Brandt, Abel Kho, Yuan Luo, Jennifer A. Pacheco, Theresa L. Walunas, Hakon Hakonarson, George Hripcsak, Cong Liu, Ning Shang, Chunhua Weng, Nephi Walton, David S. Carrell, Paul K. Crane, Eric B. Larson, Christopher G. Chute, Iftikhar J. Kullo, Robert Carroll, Josh Denny, Andrea Ramirez, Wei-Qi Wei, Jyoti Pathak, Laura K. Wiley, Rachel Richesson, Justin B. Starren, Luke Rasmussen

Summary: This study analyzed a publicly available sample of rule-based phenotype definitions and found significant variability in logical constructs and used terminologies. Despite the range of conditions, all phenotype definitions consisted of logical criteria and tabular data. This study highlights the importance of standardizing the representation of phenotype definitions.

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION (2023)

Editorial Material Public, Environmental & Occupational Health

Future of ChatGPT in Pharmacovigilance

Hanyin Wang, Yanyi Jenny Ding, Yuan Luo

DRUG SAFETY (2023)

Article Health Care Sciences & Services

Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers

Catherine A. Gao, Frederick M. Howard, Nikolay S. Markov, Emma C. Dyer, Siddhi Ramesh, Yuan Luo, Alexander T. Pearson

Summary: Researchers used ChatGPT to generate medical research abstracts, which were mostly detected as fake by an AI output detector. The generated abstracts scored lower in plagiarism detection and human review, but it was difficult for reviewers to differentiate between the generated and original ones. AI output detectors can be used as editorial tools to maintain scientific standards.

NPJ DIGITAL MEDICINE (2023)

Article Computer Science, Information Systems

Transportability of bacterial infection prediction models for critically ill patients

Garrett Eickelberg, Lazaro Nelson Sanchez-Pinto, Adrienne Sarah Kline, Yuan Luo

Summary: This study assessed the transportability of a bacterial infection risk model in three different types of intensive care units and explored the impact of multisite learning techniques on model transportability. The model performed well in internal validations but showed variation in transportability in external validations. The findings highlight the importance of external model validation on diverse clinical populations prior to implementation.

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION (2023)

Article Computer Science, Information Systems

Longitudinal clustering of Life's Essential 8 health metrics: application of a novel unsupervised learning method in the CARDIA study

Peter Graffy, Lindsay Zimmerman, Yuan Luo, Jingzhi Yu, Yuni Choi, Rachel Zmora, Donald Lloyd-Jones, Norrina Bai Allen

Summary: Longitudinal clustering analysis using subgraph augmented non-negative matrix factorization (SANMF) can identify three different patterns of cardiovascular health behavior and assess the risk of future adverse cardiovascular events.

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION (2023)

Review Biotechnology & Applied Microbiology

Statistical and machine learning methods for spatially resolved transcriptomics data analysis

Zexian Zeng, Yawei Li, Yiming Li, Yuan Luo

Summary: The recent advancement in spatial transcriptomics technology enables multiplexed profiling of cellular transcriptomes and spatial locations. With the improvement in experimental technologies, there is a need to develop better analytical approaches and re-evaluate current assumptions. This article reviews the recent development of statistical and machine learning methods in spatial transcriptomics and summarizes the challenges and opportunities ahead.

GENOME BIOLOGY (2022)

暂无数据