Article
Health Care Sciences & Services
Shen-Ming Lee, Phuoc-Loc Tran, Chin-Shang Li
Summary: This paper addresses the issue of model checking for logistic regression with covariates missing at random. Two goodness-of-fit tests, Pearson chi-squared and unweighted residual sum-of-squares tests, are proposed and their test statistics are centered using inverse probability weighting (IPW) and nonparametric multiple imputation (MI) methods to solve the missing value problem. The paper establishes the asymptotic properties of these test statistics and introduces the IPW method and bootstrap re-sampling approaches to estimate the variances of the proposed test statistics. Simulation studies and real data examples are conducted to evaluate the performance of the proposed tests.
STATISTICAL METHODS IN MEDICAL RESEARCH
(2022)
Article
Computer Science, Artificial Intelligence
Siddharth Ramchandran, Gleb Tikhonov, Otto Lonnroth, Pekka Tiikkainen, Harri Lahdesmaki
Summary: Conditional variational autoencoders (CVAEs) are versatile deep latent variable models that extend the standard VAE framework by conditioning the generative model with auxiliary covariates. This paper proposes a method to learn conditional VAEs from datasets with missing values in auxiliary covariates, and demonstrates superior performance compared to previous methods in various experimental settings.
PATTERN RECOGNITION
(2024)
Article
Economics
Wei Lan, Xuerong Chen, Tao Zou, Chih-Ling Tsai
Summary: This study introduces an innovative method for imputing missing data in high missing-rate covariates inspired by semi-supervised learning concept, without imposing any model assumptions, providing closed-form imputation for continuous covariates, and also applicable for discrete covariates.
JOURNAL OF BUSINESS & ECONOMIC STATISTICS
(2022)
Article
Statistics & Probability
Ana Perez-Gonzalez, Tomas R. Cotos-Yanez, Wenceslao Gonzalez-Manteiga, Rosa M. Crujeiras-Casais
Summary: This paper introduces and analyzes goodness-of-fit tests for quantile regression models in the presence of missing observations in the response variable, based on construction of empirical processes and considering three different approaches. The performance of different test statistics is extensively studied through simulation, with an application to real data included.
STATISTICAL PAPERS
(2021)
Article
Physics, Multidisciplinary
Burim Ramosaj, Justus Tulowietzki, Markus Pauly
Summary: This study analyzes the interaction between imputation accuracy and prediction accuracy in regression learning problems with missing covariates when using machine learning methods. The results show that even a slight decrease in imputation accuracy can seriously affect prediction accuracy.
Article
Computer Science, Artificial Intelligence
Manar D. Samad, Sakib Abrar, Norou Diawara
Summary: This paper proposes methods to improve the imputation accuracy of the MICE algorithm by using ensemble learning and deep neural networks. The results of extensive analyses on multiple datasets show that the proposed methods outperform other state-of-the-art imputation algorithms, leading to better imputation accuracy and classification accuracy.
KNOWLEDGE-BASED SYSTEMS
(2022)
Article
Health Care Sciences & Services
Lauren J. Beesley, Irina Bondarenko, Michael R. Elliot, Allison W. Kurian, Steven J. Katz, Jeremy M. G. Taylor
Summary: This paper describes how to generalize the sequential regression multiple imputation procedure to handle non-random missingness when missingness may depend on other variables. The method reduces bias in the final analysis compared to standard techniques, using approximation strategies involving inclusion of an offset in the imputation model.
STATISTICAL METHODS IN MEDICAL RESEARCH
(2021)
Article
Statistics & Probability
Tiantian Liu, Yair Goldberg
Summary: We propose a family of doubly robust kernel machines for classification in the presence of missing covariates. The model features double robustness against misspecification and computation feasibility through the use of a novel convex augmented loss function and various techniques. The theoretical results and empirical analysis demonstrate the performance of the proposed kernel machine.
ELECTRONIC JOURNAL OF STATISTICS
(2023)
Article
Mathematics, Interdisciplinary Applications
Lei Wang, Siying Sun, Zheng Xia
Summary: This paper introduces an empirical likelihood-based inference for parameters defined by the general estimating equations, showing consistency and asymptotic normality of the resulting estimator. The authors propose a two-stage estimation procedure using dimension-reduced kernel estimators and AIPW-MI methods, demonstrating the finite-sample performance through simulation and application to HIV-CD4 data.
JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY
(2021)
Article
Urology & Nephrology
Katrina Blazek, Anita van Zwieten, Valeria Saglimbene, Armando Teixeira-Pinto
Summary: Health data often have missing values, and utilizing multiple imputation techniques can help reduce bias and maintain sample size. Correct specification of the imputation model is crucial for the validity of analyses. Considerations such as missing mechanism, imputation method, and result reporting are important when conducting research with multiply imputed data.
KIDNEY INTERNATIONAL
(2021)
Article
Health Care Sciences & Services
Liangyuan Hu, Jung-Yi Joyce Lin, Jiayi Ji
Summary: This research investigates a general variable selection approach that can handle missing covariates and outcomes, utilizing machine learning models and bootstrap imputation. Results suggest that extreme gradient boosting and Bayesian additive regression trees have the best overall variable selection performance.
STATISTICAL METHODS IN MEDICAL RESEARCH
(2021)
Article
Computer Science, Information Systems
Benjamin Agbo, Hussain Al-Aqrabi, Tariq Alsboui, Muhammad Hussain, Richard Hill
Summary: The growth of intelligent devices has generated vast amounts of data, but there are often missing values in these datasets. Traditional approaches for handling missing values are not practical, so there is a need to develop strategies for imputation. This study proposes an imputation strategy called $med.BFMVI$ which shows the best performance in replacing missing values and improving downstream learning.
Article
Computer Science, Artificial Intelligence
Feng Zhao, Yan Lu, Xinning Li, Lina Wang, Yingjie Song, Deming Fan, Caiming Zhang, Xiaobo Chen
Summary: Credit risk assessment is crucial for banks in loan approval and risk management. However, missing credit risk data can significantly reduce the effectiveness of the assessment model. In this paper, a novel method named MGAIN is proposed to accurately predict missing data through subset selection and multiple imputation strategy, improving the accuracy of the imputation model.
APPLIED SOFT COMPUTING
(2022)
Article
Computer Science, Information Systems
Alexander Shapiro, Yao Xie, Rui Zhang
Summary: The research develops a general theory for the goodness-of-fit test to non-linear models, where the residual of the model fit follows a chi(2) distribution related to the model order and problem dimension. A sequential method for selecting model orders is presented, demonstrating broad applications in machine learning and signal processing.
IEEE TRANSACTIONS ON INFORMATION THEORY
(2021)
Article
Mathematics
A. Pekgor
Summary: Recently, new goodness-of-fit tests based on Kullback-Leibler divergence and likelihood ratio have been introduced for the Cauchy distribution, claiming to be more powerful than traditional tests. This study proposes a novel test for the Cauchy distribution and derives its asymptotic null distribution. Critical values are determined through Monte Carlo simulation for various sample sizes, and power analysis reveals the superiority of the proposed test under certain conditions.
JOURNAL OF MATHEMATICS
(2023)