Article
Multidisciplinary Sciences
Sam Nguyen, Ryan Chan, Jose Cadena, Braden Soper, Paul Kiszka, Lucas Womack, Mark Work, Joan M. Duggan, Steven T. Haller, Jennifer A. Hanrahan, David J. Kennedy, Deepa Mukundan, Priyadip Ray
Summary: The study developed a ML-based tool using EHR data to predict adverse outcomes in COVID-19 patients, optimizing clinical utility under a given cost structure. Results showed that it is possible to achieve a significant reduction in cost with only a small reduction in predictive performance under various budget constraints.
SCIENTIFIC REPORTS
(2021)
Article
Computer Science, Artificial Intelligence
Agnes Baran, Sebastian Lerch, Mehrez El Ayari, Sandor Baran
Summary: Accurate forecasting of total cloud cover is important for various sectors, and statistical calibration using machine learning methods can significantly improve forecast skill. Adding precipitation forecast data can further enhance predictive performance.
NEURAL COMPUTING & APPLICATIONS
(2021)
Article
Computer Science, Theory & Methods
Yalin Wu, Qianjian Zhang, Yaqin Hu, Ko Sun-Woo, Xiangyan Zhang, Hongmin Zhu, Liu Jie, ShiYong Li
Summary: The rapidly increasing incidence of diabetes, especially Type-II diabetes, has led to complex complications. This study proposes a novel binary logistic regression model with feature transformation for accurately predicting the specific type of diabetes, and it is adaptable to multiple datasets. The results show that the proposed model achieved a high identification rate in diabetes prediction.
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE
(2022)
Article
Health Care Sciences & Services
Wei Yang, Jiakun Jiang, Erin M. Schnellinger, Stephen E. Kimmel, Wensheng Guo
Summary: The Brier score is a popular measure for predicting binary outcomes, but its interpretation is not straightforward due to its dependence on outcome prevalence. To address this issue, we propose a modification to the Brier score that removes the influence of outcome variance. We demonstrate through simulation that this new measure is more sensitive for comparing different prediction models. We also introduce a standardized performance improvement measure based on this criterion.
STATISTICAL METHODS IN MEDICAL RESEARCH
(2022)
Article
Computer Science, Information Systems
Ruben van den Goorbergh, Maarten van Smeden, Dirk Timmerman, Ben Van Calster
Summary: This study examined the effect of correcting class imbalance on the performance of logistic regression models and found that methods such as random undersampling, random oversampling, and SMOTE did not improve model performance and resulted in poorly calibrated models. Imbalance correction did not enhance the ability of the models to distinguish between patients with and without the outcome event.
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION
(2022)
Article
Health Care Sciences & Services
Eric Anto, Xiaogang Su
Summary: Moderation analysis is crucial for precision medicine and can be used to evaluate differential treatment effects. In the analysis of binary outcomes, a symmetry property concerning odds ratios suggests that heterogeneous treatment effects can be estimated by exchanging the roles of the outcome and treatment variables. By combining two models into one using a generalized estimating equation approach, we obtain refined inference on moderating effects and improve efficiency in identifying important moderators. Simulation studies and a trial on wart treatment demonstrate the effectiveness of the proposed method.
STATISTICAL METHODS IN MEDICAL RESEARCH
(2023)
Article
Health Care Sciences & Services
Angelika Geroldinger, Rok Blagus, Helen Ogden, Georg Heinze
Summary: In binary logistic regression, separable data refers to the existence of a linear combination of explanatory variables that perfectly predicts the outcome. Firth's logistic regression (FL) is a popular solution to obtain finite estimates in such cases. When analyzing clustered data, like in clinical research, using generalized estimating equations (GEE), convergence becomes more complicated. This article investigates extensions of FL to GEE and compares their convergence behavior and performance using simulated and real data.
BMC MEDICAL RESEARCH METHODOLOGY
(2022)
Article
Public, Environmental & Occupational Health
Rashid M. Ansari, Peter Baker
Summary: This study identified predictors of Covid-19 infection outcomes and developed prediction models, including factors such as total T cells and the number of infected cells in the blood. Results showed that factors like BMI, comorbidity, and specific cell types were significantly associated with infection severity, and the multivariate logistic regression model showed promise in predicting infection severity.
JOURNAL OF INFECTION AND PUBLIC HEALTH
(2021)
Article
Automation & Control Systems
Daniel R. Kowal
Summary: Subset selection is a valuable tool for interpretability, scientific discovery, and data compression. We propose a Bayesian approach to address the challenges in classical subset selection, and introduce a strategy that focuses on finding near-optimal subsets rather than a single best subset. We apply Bayesian decision analysis to derive the optimal linear coefficients for any subset of variables, and our approach outperforms competing methods in prediction, interval estimation, and variable selection. By analyzing a large education dataset, we gain unique insights into the factors that predict educational outcomes and identify over 200 distinct subsets of variables that offer near-optimal predictive accuracy.
JOURNAL OF MACHINE LEARNING RESEARCH
(2022)
Article
Public, Environmental & Occupational Health
Tsirizani M. Kaombe, Gracious A. Hamuza
Summary: The birth and death rates are crucial statistics for socio-economic policy planning. To accurately predict under-five mortality rate, the survey design effect needs to be considered. This study compares the bias encountered in predicting child mortality rate in Malawi using weighted and unweighted logistic regression methods.
Article
Psychology, Experimental
Robin Gomila
Summary: When estimating treatment effects on binary outcomes, linear regression is generally the best strategy. Linear regression coefficients are directly interpretable in terms of probabilities, and it is safer when interaction terms or fixed effects are included.
JOURNAL OF EXPERIMENTAL PSYCHOLOGY-GENERAL
(2021)
Article
Computer Science, Artificial Intelligence
Jia-Yen Huang, Wei-Zhen Lin
Summary: This study examines the impact of the pandemic on the variables used in a stock prediction model, finding that the major indicators affecting stock market changes differ before and after the pandemic. Separate prediction models should be established for analyzing each period. Additionally, using sentiment scores derived from replies as a predictive variable leads to more accurate predictions of stock price changes.
INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING
(2023)
Article
Biochemistry & Molecular Biology
Fatemeh Eshari, Fahime Momeni, Amirreza Faraj Nezhadi, Soudabeh Shemehsavar, Mehran Habibi-Rezaei
Summary: A novel machine-learning approach based on logistic regression (LR) is used to predict protein aggregation propensity (PAP) using a dataset of hexapeptides and eight physiochemical features. The LR model, combined with sequence and feature information, achieves high accuracy and outperforms other existing methods in PAP prediction.
INTERNATIONAL JOURNAL OF BIOLOGICAL MACROMOLECULES
(2023)
Article
Multidisciplinary Sciences
Laura Marin, Fanny Casado
Summary: This study proposes a methodology using discretization to predict biochemical recurrence of prostate cancer while optimizing the necessary variables. Discretization method can improve the prediction accuracy of biochemical recurrence and identify a subset of ten genes related to tissue structure. Adding a clinical biomarker, prostate specific antigen (PSA), enhances the prediction of biochemical recurrence.
SCIENTIFIC REPORTS
(2023)
Article
Computer Science, Artificial Intelligence
Philip M. Long, Rocco A. Servedio
Summary: The study investigates the accuracy of binary classifiers obtained by minimizing the unhinged loss, finding that even for simple linearly separable data distributions, minimizing the unhinged loss may only yield a binary classifier with accuracy no better than random guessing.
NEURAL COMPUTATION
(2022)