Article
Mathematical & Computational Biology
Ryan Miller, Patrick Breheny
Summary: Penalized regression methods like the lasso are commonly used for analyzing high-dimensional data. The lasso naturally performs variable selection, but the reliability of these selections is a concern. In this study, inspired by the local false discovery rate methodology, we propose a method for calculating the local false discovery rate for each variable considered by the lasso model, which can be used to assess the reliability of individual features and estimate the model's overall false discovery rate. We demonstrate the validity of our approach and show its practical utility in a case study on gene expression in breast cancer patients.
STATISTICS IN MEDICINE
(2023)
Article
Automation & Control Systems
Daniel R. Kowal
Summary: Subset selection is a valuable tool for interpretability, scientific discovery, and data compression. We propose a Bayesian approach to address the challenges in classical subset selection, and introduce a strategy that focuses on finding near-optimal subsets rather than a single best subset. We apply Bayesian decision analysis to derive the optimal linear coefficients for any subset of variables, and our approach outperforms competing methods in prediction, interval estimation, and variable selection. By analyzing a large education dataset, we gain unique insights into the factors that predict educational outcomes and identify over 200 distinct subsets of variables that offer near-optimal predictive accuracy.
JOURNAL OF MACHINE LEARNING RESEARCH
(2022)
Article
Statistics & Probability
Yi Zuo, Thomas G. Stewart, Jeffrey D. Blume
Summary: ProSGPV is a novel variable selection approach that strikes a good balance between inference and prediction tasks, by using second-generation p-values and l(0) penalization scheme to determine variables, achieving better performance than traditional methods.
AMERICAN STATISTICIAN
(2022)
Article
Automation & Control Systems
Raj Agrawal, Tamara Broderick
Summary: This research solves the computational bottleneck in estimating effects and variable selection in scientific problems. The proposed method utilizes a kernel trick for variable selection and estimation, achieving accurate results in short runtime.
JOURNAL OF MACHINE LEARNING RESEARCH
(2023)
Article
Mathematics
Haofeng Wang, Xuejun Jiang, Min Zhou, Jiancheng Jiang
Summary: This paper studies variable selection in distributed sparse regression with large sample size and limited memory constraint. By improving the traditional divide-and-conquer method, the proposed method can better control the false discovery rate and reduce the computational burden. Theoretical properties and computational algorithms are established, and the method is evaluated through simulations and a real example.
COMMUNICATIONS IN MATHEMATICS AND STATISTICS
(2023)
Article
Computer Science, Artificial Intelligence
Woraphon Yamaka
Summary: Sparse estimation methods show superior performance in the kink regression model, improving variable selection accuracy and prediction capabilities. However, it is unclear which sparse estimation method is more suitable for estimating the kink regression.
Article
Statistics & Probability
Yunlu Jiang, Yan Wang, Jiantao Zhang, Baojian Xie, Jibiao Liao, Wenhui Liao
Summary: This paper introduces a new method for outlier detection and robust variable selection in linear regression models, which outperforms existing methods according to Monte Carlo studies.
JOURNAL OF APPLIED STATISTICS
(2021)
Article
Mathematics
Juan C. Laria, M. Carmen Aguilera-Morillo, Enrique Alvarez, Rosa E. Lillo, Sara Lopez-Taruella, Maria del Monte-Millan, Antonio C. Picornell, Miguel Martin, Juan Romo
Summary: This paper introduces a methodology to deal with variable selection and model estimation problems in a high-dimensional set-up, which can be particularly useful in the whole genome context.
Article
Automation & Control Systems
Haoran Li, Jisheng Dai, Jianbo Xiao, Xiaobo Zou, Tao Chen, Melvin Holmose
Summary: In this study, the RAH algorithm is combined with LASSO to fit the entire solution of the LASSO problem by tracking KKT conditions and selecting the optimal regularization parameter. The results demonstrate that RAH-LASSO + PLS outperforms other methods in terms of wavelength selection and calibration.
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS
(2022)
Article
Mathematics, Interdisciplinary Applications
Bing Cai Kok, Ji Sok Choi, Hyelim Oh, Ji Yeh Choi
Summary: Extended Redundancy Analysis is a statistical tool for exploring directional relationships between multiple sets of exogenous variables and a set of endogenous variables. A Sparse Extended Redundancy Analysis via the Exclusive LASSO is proposed to address difficulties in distinguishing between true and false variables. Validation of this approach is demonstrated in a simulation study, with empirical utility shown in examples of youth academic achievement and text analysis of newspaper data.
MULTIVARIATE BEHAVIORAL RESEARCH
(2021)
Article
Biochemical Research Methods
Ayyuce Begum Bektas, Cigdem Ak, Mehmet Gonen
Summary: With the increasing sizes of computational biology datasets, previous kernel-based machine learning algorithms have failed to provide satisfactory interpretability. To address this issue, we propose a fast and efficient multiple kernel learning algorithm that can extract significant information from genomic data. Our experiments demonstrate that the algorithm outperforms baseline methods while using only a small fraction of input features, and it has the potential to discover new biomarkers and therapeutic guidelines.
Article
Mathematics
Zhongzheng Wang, Guangming Deng, Jianqi Yu
Summary: The proposed group screening procedure based on the information gain ratio for a classification model is shown to have better screening performance and classification accuracy.
JOURNAL OF MATHEMATICS
(2022)
Article
Engineering, Industrial
Chengyu Zhou, Xiaolei Fang
Summary: The diagnosis of product quality defects in multistage manufacturing processes often requires identifying crucial stages and process variables related to product anomalies. Existing models have limitations, and this article proposes a novel convex two-dimensional variable selection method to address these challenges by considering both group-wise and element-wise sparsity.
RELIABILITY ENGINEERING & SYSTEM SAFETY
(2023)
Article
Engineering, Industrial
Cheoljoon Jeong, Xiaolei Fang
Summary: The proposed method introduces a novel penalized matrix regression methodology to diagnose the root cause of product quality defects in multistage manufacturing processes. By decomposing the unknown regression coefficient matrix into two factor matrices and penalizing their rows and columns simultaneously, sparsity is induced effectively. The Block Coordinate Proximal Descent (BCPD) optimization algorithm is developed for parameter estimation and solving convex sub-optimization problems cyclically.
Article
Biochemistry & Molecular Biology
Jiaqi Liang, Chaoye Wang, Di Zhang, Yubin Xie, Yanru Zeng, Tianqin Li, Zhixiang Zuo, Jian Ren, Qi Zhao
Summary: This article introduces a method called VSOLassoBag, which integrates an ensemble learning strategy to select efficient and stable variables from high-dimensional biological data for biomarker determination. The application of VSOLassoBag on simulation and real-world datasets shows its effectiveness in identifying markers for binary classification and prognosis prediction, with comparable performance and fewer features compared to other algorithms.
JOURNAL OF GENETICS AND GENOMICS
(2023)