Article
Health Care Sciences & Services
Jiang Li, Xiaowei S. Yan, Durgesh Chaudhary, Venkatesh Avula, Satish Mudiganti, Hannah Husby, Shima Shahjouei, Ardavan Afshar, Walter F. Stewart, Mohammed Yeasin, Ramin Zand, Vida Abedi
Summary: Laboratory data from EHR can be used in prediction models to mitigate estimation bias and improve model performance with missingness using imputation methods. The study found that missingness in EHR laboratory variables was associated with patients' comorbidity data, and the multi-level imputation algorithm showed smaller imputation error compared to the cross-sectional method.
NPJ DIGITAL MEDICINE
(2021)
Article
Multidisciplinary Sciences
Hannah Voss, Simon Schlumbohm, Philip Barwikowski, Marcus Wurlitzer, Matthias Dottermusch, Philipp Neumann, Hartmut Schlueter, Julia E. Neumann, Christoph Krisp
Summary: HarmonizR is an efficient tool for missing data tolerant experimental variance reduction, which does not require data imputation and can be easily adjusted for individual dataset properties and user preferences. It demonstrated successful data harmonization for different tissue preservation techniques, LC-MS/MS instrumentation setups, and quantification approaches, and outperformed data imputation methods in detecting significant proteins.
NATURE COMMUNICATIONS
(2022)
Article
Computer Science, Artificial Intelligence
Dunlu Peng, Mengping Zou, Cong Liu, Jing Lu
Summary: This paper introduces a novel tuple-based imputation model RESI, which defines the mean integrity rate to measure the missing degree of a dataset, and utilizes the entropy weight method to select features and assign weights to attributes for improved imputation accuracy and generalization capability.
EXPERT SYSTEMS WITH APPLICATIONS
(2021)
Article
Computer Science, Artificial Intelligence
Ming-Chang Wang, Chih-Fong Tsai, Wei-Chao Lin
Summary: The demand for electricity is increasing, prompting researchers to explore data mining techniques for more effective energy management systems. Machine learning methods, specifically K-NN and SVR, have been found to outperform statistical methods in imputing missing data, especially during summer seasons and peak times.
EXPERT SYSTEMS WITH APPLICATIONS
(2021)
Article
Health Care Sciences & Services
Martijn W. Heymans, Jos W. R. Twisk
Summary: Proper handling of missing data is crucial, and consideration should be given to the mechanism of missing data. Multiple imputations are highly recommended for estimating missing values. It is important to prevent missing data rather than treating them.
JOURNAL OF CLINICAL EPIDEMIOLOGY
(2022)
Article
Multidisciplinary Sciences
Ryu Kyung Kim, Young Min Kim, Won Jin Lee, Jongho Im, Juhee Lee, Ye Jin Bang, Eun Shil Cha
Summary: Data integration involves merging datasets from different sources to obtain more information. By using the MICE algorithm, we integrated data from the National Dose Registry (NDR) and a survey, and found differences in various variables based on sex and job type.
Article
Computer Science, Artificial Intelligence
C. G. Marcelino, G. M. C. Leite, P. Celes, C. E. Pedreira
Summary: This paper investigates the effects and possible solutions to incomplete databases in regression and provides a systematic view of how missing data may affect regression results by analyzing actual publicly available databases. The results indicate that the impact of missing data can be significant, and the K-Nearest Neighbors method performs better in regression with missing data.
APPLIED ARTIFICIAL INTELLIGENCE
(2022)
Article
Computer Science, Artificial Intelligence
Siddharth Ramchandran, Gleb Tikhonov, Otto Lonnroth, Pekka Tiikkainen, Harri Lahdesmaki
Summary: Conditional variational autoencoders (CVAEs) are versatile deep latent variable models that extend the standard VAE framework by conditioning the generative model with auxiliary covariates. This paper proposes a method to learn conditional VAEs from datasets with missing values in auxiliary covariates, and demonstrates superior performance compared to previous methods in various experimental settings.
PATTERN RECOGNITION
(2024)
Article
Computer Science, Information Systems
Mohammad H. Nadimi-Shahraki, Saeed Mohammadi, Hoda Zamani, Mostafa Gandomi, Amir H. Gandomi
Summary: A four-layer model and hybrid imputation method (HIMP) were proposed to impute multi-pattern missing data in medical datasets, and experimental results showed that HIMP performed better than other comparative methods.
Review
Biochemical Research Methods
Weijia Kong, Harvard Wai Hann Hui, Hui Peng, Wilson Wen Bin Goh
Summary: Proteomics data often have missing values, which can affect subsequent statistical analyses. Different missing value imputation methods have been developed, and their performance varies when dealing with the same dataset. Choosing the right method is important for satisfactory results, and other factors such as confounders should also be considered.
Article
Ecology
Thomas F. Johnson, Nick J. B. Isaac, Agustin Paviolo, Manuela Gonzalez-Suarez
Summary: The study evaluated the performance of approaches for handling missing values in biased datasets and found that imputation can effectively handle missing data in some conditions but is not always the best solution. None of the tested methods could effectively deal with severe biases, highlighting the importance of rigorous data checking and proposing variables to assist researchers in detecting and minimizing errors in incomplete datasets.
GLOBAL ECOLOGY AND BIOGEOGRAPHY
(2021)
Article
Computer Science, Artificial Intelligence
Eunseo Oh, Hyunsoo Lee
Summary: As the importance of data-based predictive maintenance frameworks rises, missing values in industrial data become an emerging issue. This study proposes a missing value estimation method based on Gaussian progress regression and corrects them using quantum mechanics-based stochastic differential equation and Ito's lemma. This method enables more accurate data analysis.
EXPERT SYSTEMS WITH APPLICATIONS
(2024)
Article
Multidisciplinary Sciences
Anny K. G. Rodrigues, Raydonal Ospina, Marcelo R. P. Ferreira
Summary: This study proposes and evaluates a Kernel Fuzzy C-means clustering algorithm with local adaptive distances in dealing with missing data, showing better performance under the Partial Distance Strategy (PDS) and Optimal Completion Strategy (OCS) for clustering.
Article
Automation & Control Systems
Hutashan Vishal Bhagat, Manminder Singh
Summary: This article introduces a novel technique for estimating missing values, which splits the dataset into complete and incomplete subsets and sets an upper limit for each class with missing data to estimate missing values more accurately. Experimental results demonstrate the efficient estimation capability of this technique in datasets with different dimensions and missing rates.
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS
(2022)
Article
Computer Science, Artificial Intelligence
Wei Wang, Yimeng Chai, Yue Li
Summary: In this paper, a novel generative adversarial guider imputation network (GAGIN) based on generative adversarial network (GAN) is proposed for missing data imputation. The comprehensive experiments show that the proposed method outperforms state-of-the-art approaches and traditional methods in terms of RMSE on both numeric datasets and image dataset.
NEURAL COMPUTING & APPLICATIONS
(2022)
Article
Biochemical Research Methods
Yuan Luo
Summary: Clinical data often have missing entries, posing a challenge to deriving optimal knowledge from the data. The Data Analytics Challenge on Missing data Imputation (DACMI) provides a benchmark dataset for evaluating and advancing imputation techniques for clinical time series. Competitive machine learning and statistical models coupled with carefully engineered features show strong performance in imputation.
BRIEFINGS IN BIOINFORMATICS
(2022)
Review
Public, Environmental & Occupational Health
Yiqing Zhao, Yue Yu, Hanyin Wang, Yikuan Li, Yu Deng, Guoqian Jiang, Yuan Luo
Summary: This paper discusses the use of machine learning methods and causal inference models in pharmacovigilance. It points out that pharmacovigilance is lagging behind in integrating machine learning and causal inference models, and highlights current research directions and gaps.
Review
Endocrinology & Metabolism
Shelby Lynn Hophan, Olena Odnokoz, Huiping Liu, Yuan Luo, Seema Khan, William Gradishar, Zhuan Zhou, Sunil Badve, Mylin A. Torres, Yong Wan
Summary: Ductal carcinoma in situ (DCIS) is a noninvasive breast cancer, but the molecular mechanisms behind its progression to invasive ductal carcinoma (IDC) and the complexity of each lesion are still unclear. Understanding the molecular features that lead to DCIS progression and finding new strategies to identify molecular mechanisms are crucial for more targeted therapy.
Article
Emergency Medicine
Brian J. Yun, Joshua J. Baugh, Sayon Dutta, David F. M. Brown, Elizabeth S. Temin, Sarah E. Turbett, Erica S. Shenoy, Paul D. Biddinger, Anand S. Dighe, Kyle Kays, Blair Alden Parry, Brenna McKaig, Caroline Beakes, Justin Margolin, Nicole Russell, Carl Lodenstein, Dustin S. McEvoy, Michael R. Filbin
Summary: This study assessed the seroprevalence of SARS-CoV-2 antibodies among ED health care professionals without confirmed history of COVID-19 infection at a quaternary academic medical center. The results showed a low seroprevalence of SARS-CoV-2 antibodies among the participants.
JOURNAL OF EMERGENCY NURSING
(2022)
Article
Medical Informatics
Hanyin Wang, Yikuan Li, Andrew Naidech, Yuan Luo
Summary: This study analyzed a cohort of critical care patients from the MIMIC-III database and revealed disparities in social determinants among patients identified by different sepsis criteria. The study also found that the performance of mortality prediction for sepsis patients can be compromised when using a universally trained model for each subpopulation.
BMC MEDICAL INFORMATICS AND DECISION MAKING
(2022)
Correction
Public, Environmental & Occupational Health
Yiqing Zhao, Yue Yu, Hanyin Wang, Yikuan Li, Yu Deng, Guoqian Jiang, Yuan Luo
Article
Computer Science, Interdisciplinary Applications
Xuan Wang, Harrison G. Zhang, Xin Xiong, Chuan Hong, Griffin M. Weber, Gabriel A. Brat, Clara-Lea Bonzel, Yuan Luo, Rui Duan, Nathan P. Palmer, Meghan R. Hutch, Alba Gutierrez-Sacristan, Riccardo Bellazzi, Luca Chiovato, Kelly Cho, Arianna Dagliati, Hossein Estiri, Noelia Garcia-Barrio, Romain Griffier, David A. Hanauer, Yuk-Lam Ho, John H. Holmes, Mark S. Keller, Jeffrey G. Klann MEng, Sehi L'Yi, Sara Lozano-Zahonero, Sarah E. Maidlow, Adeline Makoudjou, Alberto Malovini, Bertrand Moal, Jason H. Moore, Michele Morris, Danielle L. Mowery, Shawn N. Murphy, Antoine Neuraz, Kee Yuan Ngiam, Gilbert S. Omenn, Lav P. Patel, Miguel Pedrera-Jimenez, Andrea Prunotto, Malarkodi Jebathilagam Samayamuthu, Fernando J. Sanz Vidorreta, Emily R. Schriver, Petra Schubert, Pablo Serrano-Balazote, Andrew M. South, Amelia L. M. Tan, Byorn W. L. Tan, Valentina Tibollo, Patric Tippmann, Shyam Visweswaran, Zongqi Xia, William Yuan, Daniela Zoller, Isaac S. Kohane, Paul Avillach, Zijian Guo, Tianxi Cai
Summary: This study proposes a method called SurvMaximin to estimate Cox model feature coefficients for a target population by borrowing summary information from other healthcare centers. The method achieves comparable or higher accuracy compared to existing methods and is robust to variations in sample sizes and estimated feature coefficients between centers.
JOURNAL OF BIOMEDICAL INFORMATICS
(2022)
Article
Biotechnology & Applied Microbiology
Yadi Zhou, Yuan Liu, Shagun Gupta, Mauricio Paramo, Yuan Hou, Chengsheng Mao, Yuan Luo, Julius Judd, Shayne Wierbowski, Marta Bertolotti, Mriganka Nerkar, Lara Jehi, Nir Drayman, Vlad Nicolaescu, Haley Gula, Savas Tay, Glenn Randall, Peihui Wang, John T. Lis, Cedric Feschotte, Serpil C. Erzurum, Feixiong Cheng, Haiyuan Yu
Summary: Studying the interaction between viral and host proteins can help discover therapies for viral infections. In this study, a comprehensive network of interactions between SARS-CoV-2 and human proteins was generated using high-throughput techniques, validating known host factors and identifying new ones. The network showed the highest overlap with differentially expressed genes in COVID-19 patients and revealed an interaction between a viral protein and a human transcription factor. Additionally, network-based screening of FDA-approved or investigational drugs identified several candidates with significant proximity to SARS-CoV-2 host factors, including a drug called carvedilol which showed clinical benefits and antiviral properties.
NATURE BIOTECHNOLOGY
(2022)
Article
Pathology
Daniel S. Pearson, Dustin S. McEvoy, Mandakolathur R. Murali, Anand S. Dighe
Summary: The use of a clinical decision support (CDS) alert can increase compliance with guidelines and improve the diagnostic evaluation of patients with suspected monoclonal gammopathies (MGs).
AMERICAN JOURNAL OF CLINICAL PATHOLOGY
(2023)
Article
Computer Science, Information Systems
Pascal S. Brandt, Abel Kho, Yuan Luo, Jennifer A. Pacheco, Theresa L. Walunas, Hakon Hakonarson, George Hripcsak, Cong Liu, Ning Shang, Chunhua Weng, Nephi Walton, David S. Carrell, Paul K. Crane, Eric B. Larson, Christopher G. Chute, Iftikhar J. Kullo, Robert Carroll, Josh Denny, Andrea Ramirez, Wei-Qi Wei, Jyoti Pathak, Laura K. Wiley, Rachel Richesson, Justin B. Starren, Luke Rasmussen
Summary: This study analyzed a publicly available sample of rule-based phenotype definitions and found significant variability in logical constructs and used terminologies. Despite the range of conditions, all phenotype definitions consisted of logical criteria and tabular data. This study highlights the importance of standardizing the representation of phenotype definitions.
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION
(2023)
Editorial Material
Public, Environmental & Occupational Health
Hanyin Wang, Yanyi Jenny Ding, Yuan Luo
Article
Health Care Sciences & Services
Catherine A. Gao, Frederick M. Howard, Nikolay S. Markov, Emma C. Dyer, Siddhi Ramesh, Yuan Luo, Alexander T. Pearson
Summary: Researchers used ChatGPT to generate medical research abstracts, which were mostly detected as fake by an AI output detector. The generated abstracts scored lower in plagiarism detection and human review, but it was difficult for reviewers to differentiate between the generated and original ones. AI output detectors can be used as editorial tools to maintain scientific standards.
NPJ DIGITAL MEDICINE
(2023)
Article
Computer Science, Information Systems
Garrett Eickelberg, Lazaro Nelson Sanchez-Pinto, Adrienne Sarah Kline, Yuan Luo
Summary: This study assessed the transportability of a bacterial infection risk model in three different types of intensive care units and explored the impact of multisite learning techniques on model transportability. The model performed well in internal validations but showed variation in transportability in external validations. The findings highlight the importance of external model validation on diverse clinical populations prior to implementation.
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION
(2023)
Article
Computer Science, Information Systems
Peter Graffy, Lindsay Zimmerman, Yuan Luo, Jingzhi Yu, Yuni Choi, Rachel Zmora, Donald Lloyd-Jones, Norrina Bai Allen
Summary: Longitudinal clustering analysis using subgraph augmented non-negative matrix factorization (SANMF) can identify three different patterns of cardiovascular health behavior and assess the risk of future adverse cardiovascular events.
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION
(2023)
Review
Biotechnology & Applied Microbiology
Zexian Zeng, Yawei Li, Yiming Li, Yuan Luo
Summary: The recent advancement in spatial transcriptomics technology enables multiplexed profiling of cellular transcriptomes and spatial locations. With the improvement in experimental technologies, there is a need to develop better analytical approaches and re-evaluate current assumptions. This article reviews the recent development of statistical and machine learning methods in spatial transcriptomics and summarizes the challenges and opportunities ahead.