Article
Mathematics
Zhongzheng Wang, Guangming Deng, Jianqi Yu
Summary: The proposed group screening procedure based on the information gain ratio for a classification model is shown to have better screening performance and classification accuracy.
JOURNAL OF MATHEMATICS
(2022)
Article
Computer Science, Artificial Intelligence
Xin Guan, Yoshikazu Terada
Summary: In this paper, a novel sparse kernel k-means clustering method is proposed to address the issue of clustering high-dimensional data. By optimizing the feature indicators, the clustering performance is improved.
PATTERN RECOGNITION
(2023)
Article
Computer Science, Artificial Intelligence
Lixin Cui, Lu Bai, Yue Wang, Philip S. Yu, Edwin R. Hancock
Summary: A novel feature selection method is proposed in this paper based on graph-based feature representations and the Fused Lasso framework, aiming to address issues such as overlooking structural relationships between samples, equating candidate feature relevancy with selected feature relevancy. The method uses an iterative algorithm to identify the most discriminative features and outperforms competitors on benchmark datasets.
PATTERN RECOGNITION
(2021)
Article
Automation & Control Systems
Chin Gi Soh, Ying Zhu
Summary: This paper proposes a sparse fused group lasso model for predicting the percentage purity of oil blends using Fourier-transform infrared spectroscopic data. The method improves the interpretability and prediction performance of the resultant models, while capturing group structure and coefficient structure smoothness.
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS
(2022)
Article
Computer Science, Artificial Intelligence
Fatemeh Farokhmanesh, Mohammad Taghi Sadeghi
Summary: Deep learning, a significant subcategory of machine learning, aims to replace man-made features with automatically extracted features, facing the challenge of high dimensional feature space and potential overfitting. Sparse representation based methods, known for representing data with minimal non-zero coefficients, are attractive for regularization in deep neural networks. Experimental results show that combining methods like CRFS and SGL can lead to successful regularization in deep learning tasks.
NEURAL PROCESSING LETTERS
(2021)
Article
Chemistry, Multidisciplinary
Fangyun Bai, Kin Ming Puk, Jin Liu, Hongyu Zhou, Peng Tao, Wenyong Zhou, Shouyi Wang
Summary: In recent years, machine learning methods have been applied to various scientific and technological fields, including computational chemistry. In this study, a sparse group lasso method was used to develop a classification model for an allosteric protein in different functional states. The results show that the model achieved a significant improvement in accuracy while only selecting a small number of features. This study demonstrates the importance and necessity of rigorous feature selection and evaluation methods for complex chemical systems.
JOURNAL OF COMPUTATIONAL CHEMISTRY
(2022)
Article
Biology
Juntao Li, Ke Liang, Xuekun Song
Summary: This paper introduces a cancer diagnosis method LR-ASGL based on gene expression profile data, which addresses challenges in practical applications such as noise, gene grouping, and adaptive gene selection. Through experiments, the proposed method demonstrates significant advantages in prediction and gene selection.
COMPUTERS IN BIOLOGY AND MEDICINE
(2022)
Article
Computer Science, Artificial Intelligence
Xiaokang Wang, Huiwen Wang, Shanshan Wang, Jidong Yuan
Summary: This paper presents a compositional clustering framework based on convex clustering to address the issue of high-dimensional sparse data. It uses an isometric logratio transformation and sparse group lasso penalty to select informative features and promote within-feature sparsity.
Article
Health Care Sciences & Services
Matthew Sutton, Pierre-Emmanuel Sugier, Therese Truong, Benoit Liquet
Summary: This study proposes a statistical method that combines gene pathway and pleiotropy knowledge to increase statistical power and identify important risk variants affecting multiple traits. Results show that the method performs well on simulation studies and real data analysis, detecting more true signals compared to a popular competing method.
BMC MEDICAL RESEARCH METHODOLOGY
(2022)
Article
Computer Science, Artificial Intelligence
Saptarshi Chakraborty, Swagatam Das
Summary: In this paper, a simple and efficient sparse clustering algorithm called LW-k-means is proposed for high-dimensional data. The algorithm incorporates feature weighting to enable feature selection and has a time complexity similar to traditional algorithms. The strong consistency of the LW-k-means procedure is also established. Experimental results on synthetic and real-life datasets demonstrate that LW-k-means performs competitively in terms of clustering accuracy and computational time compared to existing methods for center-based high-dimensional clustering.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2022)
Article
Management
Xiaoping Liu, Xiao-Bai Li, Sumit Sarkar
Summary: When acquiring consumer data for marketing or new business initiatives, it is crucial to determine which attributes or features of potential customers should be obtained. This study focuses on a novel feature selection problem in the context of customer data acquisition, where different features come with different acquisition costs. The problem of feature selection is examined for linear regression and logistic regression. By formulating the problem as nonlinear discrete optimization problems, we aim to minimize prediction errors under a budget constraint. Analytical properties of the solutions are derived, and a computational procedure is developed to solve the problems. Additionally, the intuitive interpretation of the feature selection criteria is provided, and the managerial implications of the solution approach are discussed. Experimental results demonstrate the effectiveness of our approach.
MANAGEMENT SCIENCE
(2023)
Article
Management
Xiaoping Liu, Xiao-Bai Li, Sumit Sarkar
Summary: This study focuses on the problem of feature selection in customer data acquisition, considering different acquisition costs. The researchers formulate the problem as nonlinear discrete optimization problems to minimize prediction errors within a budget constraint. They derive analytical properties of the solutions, propose a computational procedure, provide an intuitive interpretation of the feature selection criteria, and discuss the managerial implications. Experimental results demonstrate the effectiveness of their approach.
MANAGEMENT SCIENCE
(2023)
Article
Mathematics
Juan C. Laria, M. Carmen Aguilera-Morillo, Enrique Alvarez, Rosa E. Lillo, Sara Lopez-Taruella, Maria del Monte-Millan, Antonio C. Picornell, Miguel Martin, Juan Romo
Summary: This paper introduces a methodology to deal with variable selection and model estimation problems in a high-dimensional set-up, which can be particularly useful in the whole genome context.
Article
Computer Science, Artificial Intelligence
Jian Wang, Huaqing Zhang, Junze Wang, Yifei Pu, Nikhil R. Pal
Summary: This study presents a neural network-based feature selection scheme that controls the level of redundancy in selected features by integrating two penalties. Experimental results demonstrate the effectiveness of the proposed scheme in redundancy control.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
(2021)
Article
Computer Science, Artificial Intelligence
Wei Gao, Haizhong Yang
Summary: This study proposes a causality-based feature selection method by introducing time-varying Granger causal networks to capture the causal relationships in high-dimensional dynamic systems. It overcomes the limitations of sample scarcity and transforms the problem of learning Granger causal structures into a group variable selection problem. Experimental results demonstrate that the method is efficient in detecting changes and analyzing causal dependency structures in network evolution.
PATTERN RECOGNITION
(2022)