☆ 4.6 Article

Multiple imputation in the presence of high-dimensional data

STATISTICAL METHODS IN MEDICAL RESEARCH (2016)

Journal

STATISTICAL METHODS IN MEDICAL RESEARCH

Volume 25, Issue 5, Pages 2021-2035

Publisher

SAGE PUBLICATIONS LTD

DOI: 10.1177/0962280213511027

Keywords

Bayesian lasso regression; high-dimensional data; missing data; multiple imputation; regularized regression

Categories

Health Care Sciences & Services Mathematical & Computational Biology Medical Informatics Statistics & Probability

Funding

PCORI [ME-1303-5840]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Missing data are frequently encountered in biomedical, epidemiologic and social research. It is well known that a naive analysis without adequate handling of missing data may lead to bias and/or loss of efficiency. Partly due to its ease of use, multiple imputation has become increasingly popular in practice for handling missing data. However, it is unclear what is the best strategy to conduct multiple imputation in the presence of high-dimensional data. To answer this question, we investigate several approaches of using regularized regression and Bayesian lasso regression to impute missing values in the presence of high-dimensional data. We compare the performance of these methods through numerical studies, in which we also evaluate the impact of the dimension of the data, the size of the true active set for imputation, and the strength of correlation. Our numerical studies show that in the presence of high-dimensional data the standard multiple imputation approach performs poorly and the imputation approach using Bayesian lasso regression achieves, in most cases, better performance than the other imputation methods including the standard imputation approach using the correctly specified imputation model. Our results suggest that Bayesian lasso regression and its extensions are better suited for multiple imputation in the presence of high-dimensional data than the other regression methods.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6

Not enough ratings

Secondary Ratings

Novelty

-

Significance

-

Scientific rigor

-

Rate this paper

Recommended

Article Multidisciplinary Sciences

Multiple imputation with compatibility for high-dimensional data

Faisal Maqbool Zahid, Shahla Faisal, Christian Heumann

Summary: In high-dimensional settings, Multiple Imputation (MI) is challenging, a semi-compatible imputation model is proposed by relaxing the lasso penalty and using a ridge penalty to address instability and convergence issues. The proposed approach shows superior performance to existing MI techniques in simulation studies and real-life datasets while addressing compatibility problems.

PLOS ONE (2021)

Add to Collection

Article Health Care Sciences & Services

Multiple imputation with missing data indicators

Lauren J. Beesley, Irina Bondarenko, Michael R. Elliot, Allison W. Kurian, Steven J. Katz, Jeremy M. G. Taylor

Summary: This paper describes how to generalize the sequential regression multiple imputation procedure to handle non-random missingness when missingness may depend on other variables. The method reduces bias in the final analysis compared to standard techniques, using approximation strategies involving inclusion of an offset in the imputation model.

STATISTICAL METHODS IN MEDICAL RESEARCH (2021)

Add to Collection

Article Mathematics

Variational Bayesian Inference for Quantile Regression Models with Nonignorable Missing Data

Xiaoning Li, Mulati Tuerde, Xijian Hu

Summary: This paper investigates the application of quantile regression models from a Bayesian perspective, proposing a hierarchical model framework and using Bayesian methods to handle missing data. The research findings demonstrate the significant advantages of the proposed methodology in both simulation and real data analysis.

MATHEMATICS (2023)

Add to Collection

Article Mathematics

A Noise-Aware Multiple Imputation Algorithm for Missing Data

Fangfang Li, Hui Sun, Yu Gu, Ge Yu

Summary: This paper proposes a noise-aware missing data multiple imputation algorithm NPMI for static data. Different multiple imputation models are proposed according to the missing mechanism of data. The method to determine the imputation order of multivariablesmissing is given. Experiments on real and synthetic datasets verify the accuracy and efficiency of the proposed algorithm.

MATHEMATICS (2023)

Add to Collection

Article Statistics & Probability

Sample-Wise Combined Missing Effect Model with Penalization

Jialu Li, Guan Yu, Qizhai Li, Yufeng Liu

Summary: Modern high-dimensional statistical inference often faces the problem of missing data. In this article, we propose a new method called SCOM to deal with missing data occurring in predictors. SCOM makes full use of all available data and is robust with respect to various missing mechanisms.

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS (2023)

Add to Collection

Article Psychology, Multidisciplinary

How to Apply Variable Selection Machine Learning Algorithms With Multiply Imputed Data: A Missing Discussion

Heather J. Gunn, Panteha Hayati Rezvan, M. Isabel Fernandez, W. Scott Comulada

Summary: Psychological researchers often use standard linear regression to identify relevant predictors of an outcome of interest. Regularization methods like the LASSO can mitigate overfitting, increase interpretability, and improve prediction. However, handling missing data when using regularization-based variable selection methods is complicated. This tutorial describes three approaches for fitting a LASSO when using multiple imputation to handle missing data and highlights the need for additional research on best practices.

PSYCHOLOGICAL METHODS (2023)

Add to Collection

Article Mathematical & Computational Biology

Multiple imputation for longitudinal data using Bayesian lasso imputation model

Yusuke Yamaguchi, Satoshi Yoshida, Toshihiro Misumi, Kazushi Maruo

Summary: Multiple imputation is a promising approach for handling missing data in longitudinal clinical studies, particularly when incorporating informative auxiliary variables. The Bayesian lasso imputation model demonstrated superior performance in simulation studies, providing unbiased treatment effect estimates and higher statistical power compared to conventional methods. Ignoring informative auxiliary variables can lead to serious bias and inflated type I error rates.

STATISTICS IN MEDICINE (2022)

Add to Collection

Article Computer Science, Interdisciplinary Applications

Weighted multiple blockwise imputation method for high-dimensional regression with blockwise missing data

Jingmao Li, Qingzhao Zhang, Song Chen, Kuangnan Fang

Summary: In this article, a novel weighted multiple blockwise imputation method is proposed to address the problem of high-dimensional regression with blockwise missing data. The method demonstrates superior performance in variable selection, parameter estimation, and prediction ability.

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION (2023)

Add to Collection

Article Management

A Regularized High-Dimensional Positive Definite Covariance Estimator with High-Frequency Data

Liyuan Cui, Yongmiao Hong, Yingxing Li, Junhui Wang

Summary: This paper proposes a novel large-dimensional positive definite covariance estimator for high-frequency data and achieves good performance in the presence of microstructure noises and asynchronous trading.

MANAGEMENT SCIENCE (2023)

Add to Collection

Article Biochemical Research Methods

Multi-omics regulatory network inference in the presence of missing data

Juan D. Henao, Michael Lauber, Manuel Azevedo, Anastasiia Grekova, Fabian Theis, Markus List, Christoph Ogris, Benjamin Schubert

Summary: This study integrated regression-based methods that can handle missingness into KiMONo, and benchmarked their performance on commonly encountered missing data scenarios in single- and multi-omics studies. The results showed that two-step approaches that explicitly handle missingness performed best for imbalanced omics-layers dimensions, while methods implicitly handling missingness performed best for balanced omics-layers dimensions. The study demonstrated the feasibility of robust multi-omics network inference in the presence of missing data with KiMONo.

BRIEFINGS IN BIOINFORMATICS (2023)

Add to Collection

Article Biochemical Research Methods

Fast and interpretable genomic data analysis using multiple approximate kernel learning

Ayyuce Begum Bektas, Cigdem Ak, Mehmet Gonen

Summary: With the increasing sizes of computational biology datasets, previous kernel-based machine learning algorithms have failed to provide satisfactory interpretability. To address this issue, we propose a fast and efficient multiple kernel learning algorithm that can extract significant information from genomic data. Our experiments demonstrate that the algorithm outperforms baseline methods while using only a small fraction of input features, and it has the potential to discover new biomarkers and therapeutic guidelines.

BIOINFORMATICS (2022)

Add to Collection

Article Statistics & Probability

Missing Data Imputation with High-Dimensional Data

Alberto Brini, Edwin R. van den Heuvel

Summary: This article explores the imputation of missing data in high-dimensional datasets and compares different approaches using a linear mixed modeling framework. The recursive partitioning and predictive mean matching algorithm show superiority in terms of bias, mean squared error, and coverage of parameter estimates.

AMERICAN STATISTICIAN (2023)

Add to Collection

Article Mathematics

Group Feature Screening Based on Information Gain Ratio for Ultrahigh-Dimensional Data

Zhongzheng Wang, Guangming Deng, Jianqi Yu

Summary: The proposed group screening procedure based on the information gain ratio for a classification model is shown to have better screening performance and classification accuracy.

JOURNAL OF MATHEMATICS (2022)

Add to Collection

Article Biology

Infinite hidden Markov models for multiple multivariate time series with missing data

Lauren Hoskovec, Matthew D. Koslovsky, Kirsten Koehler, Nicholas Good, Jennifer L. Peel, John Volckens, Ander Wilson

Summary: This paper presents an infinite hidden Markov model for multiple asynchronous multivariate time series with missing data. The model excels in estimating hidden states and imputing missing data through beam sampling and Bayesian multiple imputation algorithm. The model performs well in simulation studies and real-case validation, showing improvements in estimation and imputation compared to existing approaches.

BIOMETRICS (2023)

Add to Collection

Article Urology & Nephrology

A practical guide to multiple imputation of missing data in nephrology

Katrina Blazek, Anita van Zwieten, Valeria Saglimbene, Armando Teixeira-Pinto

Summary: Health data often have missing values, and utilizing multiple imputation techniques can help reduce bias and maintain sample size. Correct specification of the imputation model is crucial for the validity of analyses. Considerations such as missing mechanism, imputation method, and result reporting are important when conducting research with multiply imputed data.

KIDNEY INTERNATIONAL (2021)

Add to Collection

Article Mathematical & Computational Biology

Assessing predictive accuracy of survival regressions subject to nonindependent censoring

Ming Wang, Qi Long, Chixiang Chen, Lijun Zhang

STATISTICS IN MEDICINE (2020)

Add to Collection

Article Biochemical Research Methods

Sparse multiple co-Inertia analysis with application to integrative analysis of multi -Omics data

Eun Jeong Min, Qi Long

BMC BIOINFORMATICS (2020)

Add to Collection

Article Otorhinolaryngology

Mental health among otolaryngology resident and attending physicians during the COVID-19 pandemic: National study

Alyssa M. Civantos, Yasmeen Byrnes, Changgee Chang, Aman Prasad, Kevin Chorath, Seerat K. Poonia, Carolyn M. Jenks, Andres M. Bur, Punam Thakkar, Evan M. Graboyes, Rahul Seth, Samuel Trosman, Anni Wong, Benjamin M. Laitman, Brianna N. Harris, Janki Shah, Vanessa Stubbs, Garret Choby, Qi Long, Christopher H. Rassekh, Erica Thaler, Karthik Rajasekaran

HEAD AND NECK-JOURNAL FOR THE SCIENCES AND SPECIALTIES OF THE HEAD AND NECK (2020)

Add to Collection

Article Hematology

Recombinant human thrombopoietin promotes platelet engraftment after umbilical cord blood transplantation

Baolin Tang, Lulu Huang, Huilan Liu, Siqi Cheng, Kaidi Song, Xuhan Zhang, Wen Yao, Lijuan Ning, Xiang Wan, Guangyu Sun, Yun Wu, Jiehui Cheng, Qi Long, Zimin Sun, Xiaoyu Zhu

BLOOD ADVANCES (2020)

Add to Collection

Article Radiology, Nuclear Medicine & Medical Imaging

Preoperative breast MR imaging in newly diagnosed breast cancer: Comparison of outcomes based on mammographic modality, breast density and breast parenchymal enhancement

Azadeh Elmi, Emily F. Conant, Andrew Kozlov, Anthony J. Young, Qi Long, Robert K. Doot, Elizabeth S. McDonald

Summary: The study findings suggest that women with dense breasts benefit more from preoperative MR in breast cancer patients who undergo digital breast tomosynthesis (DBT) imaging at diagnosis. On the other hand, women imaged only with digital mammography (DM) show additional malignancy detection by MR regardless of breast density.

CLINICAL IMAGING (2021)

Add to Collection

Meeting Abstract Oncology

Racial disparities in efficacy of first-line abiraterone in metastatic castrate-resistant prostate cancer (mCRPC).

Mallika Marar, Long Qi, Ronac Mamtani, Vivek Narayan, Neha Vapiwala, Ravi Bharat Parikh

JOURNAL OF CLINICAL ONCOLOGY (2021)

Add to Collection

Article Multidisciplinary Sciences

Exploring deep neural networks via layer-peeled model: Minority collapse in imbalanced training

Cong Fang, Hangfeng He, Qi Long, Weijie J. Su

Summary: The Layer-Peeled Model is introduced in this paper as a nonconvex optimization program to better understand deep neural networks. It is shown to inherit characteristics of well-trained neural networks and can help explain and predict common empirical patterns of deep-learning training. The model reveals phenomena such as neural collapse on balanced datasets and Minority Collapse on imbalanced datasets, providing insights into how to mitigate the latter.

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA (2021)

Add to Collection

Editorial Material Oncology

Addressing Common Misuses and Pitfalls of P values in Biomedical Research

Ming Wang, Qi Long

Summary: In recent years, there has been a growing recognition that P values are often misused or misinterpreted in biomedical research, especially with the emergence of big health data. To address this problem, sound study design and appropriate statistical analysis strategies are needed.

CANCER RESEARCH (2022)

Add to Collection

Article Biology

CEDAR: communication efficient distributed analysis for regressions

Changgee Chang, Zhiqi Bu, Qi Long

Summary: Electronic health records (EHRs) provide opportunities for precision medicine, but sharing data is a challenge. We propose a method that aggregates data from external sites by treating it as missing data. We also suggest incorporating posterior samples from remote sites to improve parameter estimates.

BIOMETRICS (2023)

Add to Collection

Article Statistics & Probability

Testing Biased Randomization Assumptions and Quantifying Imperfect Matching and Residual Confounding in Matched Observational Studies

Kan Chen, Siyu Heng, Qi Long, Bo Zhang

Summary: One central goal of observational study design is to incorporate non-experimental data into an approximate randomized controlled trial using statistical matching. However, residual imbalance due to imperfect matching of observed covariates often persists. This article presents two generic classes of exact statistical tests for a biased randomization assumption and introduces a quantity called residual sensitivity value (RSV) as a means to quantify the level of residual confounding due to imperfect matching of observed covariates in a matched sample. The proposed methodology is demonstrated through a re-examination of a famous observational study.

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS (2023)

Add to Collection

Letter Oncology

High SOX2 expression is associated with poor survival in patients with newly diagnosed multiple myeloma

Xinhe Shan, Qi Long, Alfred L. Garfall, Sandra P. Susanibar-Adaniya

BLOOD CANCER JOURNAL (2023)

Add to Collection

Proceedings Paper Computer Science, Artificial Intelligence

Federated f-Differential Privacy

Qinqing Zheng, Shuxiao Chen, Qi Long, Weijie Su

Summary: Federated learning is a training paradigm where clients collaboratively learn models while protecting the privacy of their local sensitive data. This paper introduces federated f-differential privacy and proposes a private federated learning framework PriFedSync which achieves privacy guarantee successfully.

24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS) (2021)

Add to Collection

Proceedings Paper Engineering, Biomedical

Deep Multiview Learning to Identify Population Structure with Multimodal Imaging

Yixue Feng, Mansu Kim, Xiaohui Yao, Kefei Liu, Qi Long, Li Shen

2020 IEEE 20TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE 2020) (2020)

Add to Collection

Proceedings Paper Computer Science, Artificial Intelligence

Joint Bayesian Variable Selection and Graph Estimation for Non-linear SVM with Application to Genomics Data

Wenli Sun, Changgee Chang, Qi Long

2020 IEEE 7TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2020) (2020)

Add to Collection

Proceedings Paper Computer Science, Artificial Intelligence

GRIA: Graphical Regularization for Integrative Analysis

Changgee Chang, Jihwan Oh, Qi Long

PROCEEDINGS OF THE 2020 SIAM INTERNATIONAL CONFERENCE ON DATA MINING (SDM) (2020)

Add to Collection

No Data Available

© Peeref 2019-2024. All rights reserved.