☆ 4.2 Article

Missing Data Imputation with High-Dimensional Data

AMERICAN STATISTICIAN (2023)

期刊

AMERICAN STATISTICIAN

卷 -, 期 -, 页码 -

出版社

TAYLOR & FRANCIS INC

DOI: 10.1080/00031305.2023.2259962

关键词

High-dimensional data; Linear mixed models; Longitudinal data; Missing data; Multiple imputation; Penalized regression; Principal component analysis; Recursive partitioning

类别

Statistics & Probability

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This article explores the imputation of missing data in high-dimensional datasets and compares different approaches using a linear mixed modeling framework. The recursive partitioning and predictive mean matching algorithm show superiority in terms of bias, mean squared error, and coverage of parameter estimates.

Imputation of missing data in high-dimensional datasets with more variables P than samples N, P >> N, is hampered by the data dimensionality. For multivariate imputation, the covariance matrix is ill conditioned and cannot be properly estimated. For fully conditional imputation, the regression models for imputation cannot include all the variables. Thus, the high dimension requires special imputation approaches. In this article, we provide an overview and realistic comparisons of imputation approaches for high-dimensional data when applied to a linear mixed modeling (LMM) framework. We examine approaches from three different classes using simulation studies: multiple imputation with penalized regression, multiple imputation with recursive partitioning and predictive mean matching; and multiple imputation with Principal Component Analysis (PCA). We illustrate the methods on a real case study where a multivariate outcome (i.e., an extracted set of correlated biomarkers from human urine samples) was collected and monitored over time and we discuss the proposed methods with more standard imputation techniques that could be applied by ignoring either the multivariate or the longitudinal dimension. Our simulations demonstrate the superiority of the recursive partitioning and predictive mean matching algorithm over the other methods in terms of bias, mean squared error and coverage of the LMM parameter estimates when compared to those obtained from a data analysis without missingness, although it comes at the expense of high computational costs. It is worthwhile reconsidering much faster methodologies like the one relying on PCA.

Missing Data Imputation with High-Dimensional Data

期刊

AMERICAN STATISTICIAN

出版社

TAYLOR & FRANCIS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Missing Data Imputation with High-Dimensional Data

期刊

AMERICAN STATISTICIAN

出版社

TAYLOR & FRANCIS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文