4.6 Article

SuperMICE: An Ensemble Machine Learning Approach to Multiple Imputation by Chained Equations

Journal

AMERICAN JOURNAL OF EPIDEMIOLOGY
Volume 191, Issue 3, Pages 516-525

Publisher

OXFORD UNIV PRESS INC
DOI: 10.1093/aje/kwab271

Keywords

machine learning; missing data; missingness at random; multiple imputation by chained equations; simulation

Funding

  1. University of California Firearm Violence Research Center

Ask authors/readers for more resources

This paper proposes a data-adaptive approach to model selection for addressing missing data, using Super Learner and local kernel estimation in MICE to predict the missing values, which results in final parameter estimates with lower bias and better coverage.
Researchers often face the problem of how to address missing data. Multiple imputation is a popular approach, with multiple imputation by chained equations (MICE) being among the most common and flexible methods for execution. MICE iteratively fits a predictive model for each variable with missing values, conditional on other variables in the data. In theory, any imputation model can be used to predict the missing values. However, if the predictive models are incorrectly specified, they may produce biased estimates of the imputed data, yielding inconsistent parameter estimates and invalid inference. Given the set of modeling choices that must be made in conducting multiple imputation, in this paper we propose a data-adaptive approach to model selection. Specifically, we adapt MICE to incorporate an ensemble algorithm, Super Learner, to predict the conditional mean for each missing value, and we also incorporate a local kernel-based estimate of variance. We present a set of simulations indicating that this approach produces final parameter estimates with lower bias and better coverage than other commonly used imputation methods. These results suggest that using a flexible machine learning imputation approach can be useful in settings where data are missing at random, especially when the relationships among the variables are complex.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available