Article
Mathematical & Computational Biology
Chia-Rui Chang, Yue Song, Fan Li, Rui Wang
Summary: Covariate adjustment is important in analyzing data from randomized clinical trials, but missing data can be a barrier. This study reviews different covariate adjustment methods with incomplete covariate data. The researchers propose a weighting approach that combines inverse probability weighting and overlap weighting to adjust for missing outcomes and covariates, and conduct comprehensive simulation studies to evaluate the performance of the methods.
STATISTICS IN MEDICINE
(2023)
Article
Health Care Sciences & Services
Ping-Tee Tan, Suzie Cro, Eleanor Van Vogt, Matyas Szigeti, Victoria R. Cornelius
Summary: Missing data is common in RCTs, and MI is widely used for analysis, while controlled MI is less frequently used mainly in sensitivity analysis. The current use and reporting of MI methods in RCTs need improvement.
BMC MEDICAL RESEARCH METHODOLOGY
(2021)
Article
Sport Sciences
David N. Borg, Robert Nguyen, Nicholas J. Tierney
Summary: The study found that only 11.0% of articles published in 2019 mentioned missing data, and recommended that researchers describe the quantity and situation of missing data, conduct exploratory analysis, and provide visualizations describing missingness. Missing values should be imputed, and researchers should explore imputation methods to ensure representativeness.
SCIENCE AND MEDICINE IN FOOTBALL
(2022)
Article
Biochemical Research Methods
Juan D. Henao, Michael Lauber, Manuel Azevedo, Anastasiia Grekova, Fabian Theis, Markus List, Christoph Ogris, Benjamin Schubert
Summary: This study integrated regression-based methods that can handle missingness into KiMONo, and benchmarked their performance on commonly encountered missing data scenarios in single- and multi-omics studies. The results showed that two-step approaches that explicitly handle missingness performed best for imbalanced omics-layers dimensions, while methods implicitly handling missingness performed best for balanced omics-layers dimensions. The study demonstrated the feasibility of robust multi-omics network inference in the presence of missing data with KiMONo.
BRIEFINGS IN BIOINFORMATICS
(2023)
Article
Computer Science, Interdisciplinary Applications
Mutamba T. Kayembe, Frans E. S. Tan, Gerard J. P. van Breukelen, Shahab Jolani
Summary: This article compares different missing data methods in randomized controlled trials with joint missingness. The results show that no single method universally outperforms the others, but LMM and MI demonstrate superior performance across most missingness scenarios.
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION
(2023)
Article
Ecology
Thomas J. Hossie, Jenilee Gobin, Dennis L. Murray
Summary: The COVID-19 pandemic has significantly impacted research in ecology and evolution, leading to the suspension of many research programs and creating gaps in ecological datasets. Monitoring efforts were also curtailed, affecting how missing data are handled and requiring researchers to use more robust methods to ensure accurate inference.
FRONTIERS IN ECOLOGY AND EVOLUTION
(2021)
Article
Automation & Control Systems
Hutashan Vishal Bhagat, Manminder Singh
Summary: The advent of modern Internet of Things has made data collection and availability easier, but it also brings challenges of data missingness and data labeling due to the large volume and dimensionality of the generated data. Data clustering, an unsupervised pattern classification technique, is commonly used to identify the structure of datasets and group similar data items together. In the context of chemo-metrics, clustering techniques play a significant role in identifying the relationships between compound structures and properties or activities. This paper proposes a Data Partitioning-based Clustering Framework (DPCF) that utilizes the NMVI technique to impute missing values and a novel Z-Clust clustering algorithm that efficiently clusters unlabeled data samples. The experimental results demonstrate that the proposed Z-Clust clustering technique outperforms existing clustering techniques in terms of cluster formation. Therefore, the DPCF framework is well suited for the analysis of unlabeled datasets with missing values.
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS
(2022)
Article
Medicine, Research & Experimental
Mia S. Tackney, Derek G. Cook, Daniel Stahl, Khalida Ismail, Elizabeth Williamson, James Carpenter
Summary: This study discusses the use of wearable devices in clinical trials to evaluate the impact of interventions on physical activity. The proposed analysis framework defines missing data based on wear time and suggests a multiple imputation approach for handling partially observed daily step counts.
Review
Computer Science, Information Systems
Tressy Thomas, Enayat Rajabi
Summary: The study revealed that clustering- and instance-based algorithms are the most proposed methods for data imputation. Percentage of correct prediction (PCP) and root mean square error (RMSE) are commonly used evaluation metrics. Most studies source data sets from publicly available repositories for experimentation, but computational expense and experimentation with large data sets appear challenging.
DATA TECHNOLOGIES AND APPLICATIONS
(2021)
Article
Automation & Control Systems
Sara Rejeb, Catherine Duveau, Tabea Rebafka
Summary: In this paper, an extension of self-organizing maps to incomplete data is proposed, along with the estimation of missing values. The missSOM algorithm, an adaptation of the Kohonen algorithm, is introduced for computing these self-organizing maps and imputing missing values. Numerical experiments demonstrate the efficiency and performance of missSOM compared to the state of the art.
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS
(2022)
Editorial Material
Public, Environmental & Occupational Health
Stephen R. Cole, Paul N. Zivich, Jessie K. Edwards, Rachael K. Ross, Bonnie E. Shook-Sa, Joan T. Price, Jeffrey S. A. Stringer
Summary: Missing data is a common and significant problem in epidemiology, leading to decreased precision and notable bias. There are currently too few simple examples illustrating the types of missing data and their impact on results, and ignoring missing data remains a standard approach in epidemiology.
AMERICAN JOURNAL OF EPIDEMIOLOGY
(2023)
Article
Computer Science, Interdisciplinary Applications
Pablo Ferri, Nekane Romero-Garcia, Rafael Badenes, David Lora-Pablos, Teresa Garcia Morales, Agustin Gomez de la Camara, Juan M. Garcia-Gomez, Carlos Saez
Summary: This study aims to characterize effective data imputation techniques and machine learning models for dealing with highly missing numerical data in electronic health records. The results suggest that combining translation and encoding imputation with tree ensemble classifiers can maximize performance in the presence of extremely incomplete data.
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE
(2023)
Article
Statistics & Probability
Anqi Zhao, Peng Ding
Summary: Randomized experiments allow for consistent estimation of average treatment effect without strong modeling assumptions. Adjustments for missing covariates can improve estimation efficiency. The missingness-indicator method is recommended due to its advantages over other strategies.
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
(2022)
Article
Mathematical & Computational Biology
Lisa N. Yelland, Jennie Louise, Brennan C. Kahan, Tim P. Morris, Katherine J. Lee, Thomas R. Sullivan
Summary: Many trials use stratified randomisation to allocate participants, but it is unclear how to adjust for stratification variables affected by misclassification. A simulation study comparing different adjustment methods for continuous outcomes was conducted. Adjusting for the true strata was found to be optimal, while adjusting for the randomisation strata or the updated strata depended on the specific setting. The updated strata method is recommended for adjustment, along with subgroup analyses, in order to address stratification errors in practice.
STATISTICS IN MEDICINE
(2023)
Article
Health Care Sciences & Services
Shannon L. Gutenkunst, Melanie L. Bell
Summary: This study investigated different methods for managing missing items in the Fagerstrom Test for Nicotine Dependence (FTND) and found that proration performed the best in terms of accuracy and precision. However, a sensitivity analysis with a different method is recommended when more than 10% of data are missing.
BMC MEDICAL RESEARCH METHODOLOGY
(2022)
Article
Health Care Sciences & Services
Schadrac C. Agbla, Bianca De Stavola, Karla DiazOrdaz
STATISTICAL METHODS IN MEDICAL RESEARCH
(2020)
Article
Health Care Sciences & Services
Jonathan W. Bartlett, Rachael A. Hughes
STATISTICAL METHODS IN MEDICAL RESEARCH
(2020)
Letter
Mathematical & Computational Biology
Jonathan W. Bartlett, Tim P. Morris, Mats J. Stensrud, Rhian M. Daniel, Stijn K. Vansteelandt, Carl-Fredrik Burman
STATISTICS IN BIOPHARMACEUTICAL RESEARCH
(2020)
Article
Mathematics, Interdisciplinary Applications
Oliver Hines, Stijn Vansteelandt, Karla Diaz-Ordaz
Summary: In this study, G-estimators were proposed for direct and indirect effects under partially linear mean models, with consistent asymptotic normality demonstrated for indirect effects when models are correctly specified. A new score testing framework was constructed using generalized method of moments (GMM) results, showing better performance in terms of power and small sample sizes compared to traditional tests in a partially linear setting.
Article
Statistics & Probability
Paul T. von Hippel, Jonathan W. Bartlett
Summary: Multiple imputation is a method for repairing and analyzing data with missing values by replacing them with random values drawn from an imputation model. The most popular form is posterior draw multiple imputation (PDMI) and maximum likelihood multiple imputation (MLMI), with MLMI being faster and slightly more efficient. Estimating the standard errors of MLMI point estimates was a past barrier, which has been addressed by implementing three consistent standard error formulas.
STATISTICAL SCIENCE
(2021)
Article
Mathematical & Computational Biology
Jonathan W. Bartlett
Summary: Reference-based multiple imputation methods are widely used for handling missing data in randomized clinical trials. This article reviews the debate on whether Rubin's variance estimator or alternative (smaller) variance estimators targeting the repeated sampling variance are more appropriate. It suggests that the repeated sampling variance is more appropriate and proposes a recent proposal for combining bootstrapping with multiple imputation as a widely applicable general solution.
STATISTICS IN BIOPHARMACEUTICAL RESEARCH
(2023)
Article
Public, Environmental & Occupational Health
Irene Petersen, Alexander Crozier, Iain Buchan, Michael J. Mina, Jonathan W. Bartlett
Summary: The testing for SARS-CoV-2 internationally has largely been focused on COVID-19 diagnosis through PCR tests among symptomatic individuals, but there has been a recent shift towards using LFT for testing asymptomatic individuals in public health programs. By recalibrating relative performance statistics, the sensitivity of LFT for detecting individuals shedding SARS-CoV-2 antigens can be significantly improved.
CLINICAL EPIDEMIOLOGY
(2021)
Article
Pharmacology & Pharmacy
Marcel Wolbers, Alessandro Noci, Paul Delmar, Craig Gower-Page, Sean Yiu, Jonathan W. Bartlett
Summary: This article introduces an alternative imputation method for longitudinal outcomes in clinical trials. The proposed method, based on deterministic conditional mean imputation and jackknife inference, can be used for imputation under a missing-at-random assumption and is not affected by random sampling.
PHARMACEUTICAL STATISTICS
(2022)
Article
Mathematical & Computational Biology
Kelly Van Lancker, Sergey Tarima, Jonathan Bartlett, Madeline Bauer, Bharani Bharani-Dharan, Frank Bretz, Nancy Flournoy, Hege Michiels, Camila Olarte Parra, James L. Rosenberger, Suzie Cro
Summary: This article discusses how to address trial disruptions caused by the COVID-19 pandemic and proposes strategies and methods to deal with them. By introducing the concepts of estimands and sensitivity analyses, the impact of pandemic-related interferences on trial results can be better understood, and considerations for future trial designs can be provided.
STATISTICS IN BIOPHARMACEUTICAL RESEARCH
(2022)
Article
Mathematical & Computational Biology
Camila Olarte Parra, Rhian M. Daniel, Jonathan W. Bartlett
Summary: This article focuses on the hypothetical strategy proposed in ICH E9 addendum for handling intercurrent events. It discusses the estimation of treatment effect under the hypothetical scenario where intercurrent events are prevented using causal inference and missing data methods. The article establishes links between certain causal inference estimators and missing data estimators, which can be helpful for researchers familiar with one set of methods but not the other.
STATISTICS IN BIOPHARMACEUTICAL RESEARCH
(2023)
Editorial Material
Biology
Karla DiazOrdaz
Summary: This article discusses the assumptions necessary for identifying average treatment effects and local average treatment effects in instrumented difference-in-differences (IDID). It also explores the potential trade-offs between the assumptions of standard instrumental variable (IV) methods and those needed for the proposed IDID method in both one- and two-sample settings. Furthermore, the interpretation of estimands identified under the assumption of monotonicity is discussed.
Article
Health Care Sciences & Services
Rosaleen Peggy Cornish, Jonathan William Bartlett, John Macleod, Kate Tilling
Summary: This study investigated the bias in exposure odds ratio (OR) estimation when using complete case logistic regression with a binary outcome that depends on a continuous outcome. The inclusion of a misclassified form of the incomplete outcome as an auxiliary variable in multiple imputation was also examined for bias reduction. The results showed that there was bias in the exposure OR, especially when the association between the continuous outcome and missingness was strong. The inclusion of the auxiliary variable helped reduce bias, particularly when it had high sensitivity and specificity.
JOURNAL OF CLINICAL EPIDEMIOLOGY
(2023)
Article
Social Sciences, Mathematical Methods
Anna-Carolina Haensch, Jonathan Bartlett, Bernd Weiss
Summary: Discrete-time survival analysis (DTSA) models are popular in social sciences for modeling events. However, missing data in covariates poses challenges in the analysis of DTSA. Multiple imputation (MI) is a popular approach to address these challenges, but there is little guidance on incorporating observed outcome information into the imputation models in DTSA. This study explores and compares different existing approaches, and proposes an extended method.
SOCIOLOGICAL METHODS & RESEARCH
(2022)
Letter
Mathematical & Computational Biology
Tim P. Morris, Ian R. White, Suzie Cro, Jonathan W. Bartlett, James R. Carpenter, Tra My Pham
Summary: For simulation studies evaluating methods of handling missing data, generating partially observed data by fixing complete data and simulating missingness indicators repeatedly is rarely appropriate.
BIOMETRICAL JOURNAL
(2023)
Article
Public, Environmental & Occupational Health
Phil Edwards, Sajjan Yadav, Jonathan Bartlett, John Porter
Summary: This study examines the epidemiology of construction site injuries in Delhi, India using police records. The findings reveal that construction workers are at higher risk of injuries, particularly in developing countries. Female construction workers are also at significant risk, and children accompanying their parents at work are also in danger. Building collapses and electrical shocks are the main hazards faced by construction workers. Introducing and enforcing occupational safety, health, and working conditions laws are necessary to control this injury burden.
INJURY EPIDEMIOLOGY
(2022)