Article
Biochemical Research Methods
Jing Tang, Minjie Mou, Yunxia Wang, Yongchao Luo, Feng Zhu
Summary: Metaproteomics faces challenges of dimensionality and sparsity, and data reduction methods are crucial for identifying significant features and reducing redundancy. The performance of feature selection methods depends on data characteristics, and the online tool MetaFS offers a variety of FS methods for evaluating potential biomarkers in microbiome studies through comprehensive criteria.
BRIEFINGS IN BIOINFORMATICS
(2021)
Article
Statistics & Probability
Chenguang Dai, Buyu Lin, Xin Xing, Jun S. Liu
Summary: This article presents a new data-splitting method DS for controlling the false discovery rate (FDR) while maintaining high power. Additionally, a Multiple Data Splitting (MDS) method is proposed to stabilize selection results and boost power.
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
(2022)
Article
Statistics & Probability
Chenguang Dai, Buyu Lin, Xin Xing, Jun S. Liu
Summary: The Generalized Linear Model (GLM) is widely used in modeling non-Gaussian data. This article presents a framework for feature selection in GLM that can control the False Discovery Rate (FDR) effectively. The method constructs a mirror statistic based on data perturbation to measure feature importance and achieves FDR control by exploiting the symmetry property of the mirror statistic. The proposed methodology is scale-free and demonstrates superior performance compared to existing methods in controlling FDR.
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
(2023)
Article
Computer Science, Artificial Intelligence
Pei Huang, Zhaoming Kong, Mengying Xie, Xiaowei Yang
Summary: This paper proposes a robust unsupervised feature selection method that can effectively deal with the influence of many outliers on model performance. By learning a robust subspace that preserves local structure and addressing the shortcomings of traditional methods through outlier removal and Euclidean distance threshold setting, the superiority of the proposed method is demonstrated through experiments.
PATTERN RECOGNITION
(2023)
Article
Multidisciplinary Sciences
Iqbal Madakkatel, Ang Zhou, Mark D. McDonnell, Elina Hypponen
Summary: This study introduced a machine learning pipeline for risk factor discovery in biomedical databases, utilizing GBDT and SHAP values for model building and feature selection, followed by Cox models for confounder adjustment and validation. Results showed that a majority of health-related risk factors were accurately identified, while potential bias due to confounding factors was also observed.
SCIENTIFIC REPORTS
(2021)
Article
Biochemical Research Methods
Annette Spooner, Gelareh Mohammadi, Perminder S. Sachdev, Henry Brodaty, Arcot Sowmya
Summary: Feature selection is commonly used to identify important features in a dataset but can be unstable in high-dimensional data. Ensemble feature selection with data-driven thresholds improves stability and produces more reproducible selections of features. This study applies data-driven thresholds to ensemble feature selectors in Alzheimer's disease studies, resulting in more stable results and reflecting current findings in the literature. Data-driven thresholds eliminate the need for a fixed threshold and select a more meaningful set of features, improving interpretability of disease models.
BMC BIOINFORMATICS
(2023)
Article
Computer Science, Information Systems
Bruno Iochins Grisci, Mathias J. Krause, Marcio Dorn
Summary: The study introduces a relevance aggregation algorithm that combines the relevance computed from multiple samples by a neural network to generate scores for each input feature. Two visualization methods for learned patterns were presented to enhance model comprehension. The method accurately identifies the most important features for network predictions.
INFORMATION SCIENCES
(2021)
Article
Computer Science, Artificial Intelligence
Mariana Daniel, Rui Guerra, Antonio Brazio, Daniela Rodrigues, Ana Margarida Cavaco, Maria Dulce Antunes, Jose Valente de Oliveira
Summary: This study explores the use of feature engineering for preprocessing in fruit classification, as well as the division and selection of wavelength domain spectra. These methods can improve classification accuracy and reduce over-training. Experimental results show that the proposed method outperforms traditional approaches in accuracy and can identify features with physical chemistry significance.
EXPERT SYSTEMS WITH APPLICATIONS
(2021)
Article
Automation & Control Systems
Puneet Mishra, Kristian Hovde Liland
Summary: A new method using iterative re-weighted partial least squares and covariates selection is presented for feature selective modelling in the presence of outliers. The method iteratively down-weights the outlying samples to minimize their influence on the squared covariance estimation for selecting robust features. It is shown that models based on such features outperform those using equal sample weights in terms of prediction accuracy. The method is tested in different scenarios and its performance is demonstrated on a real spectral data set.
JOURNAL OF CHEMOMETRICS
(2023)
Article
Automation & Control Systems
Saroj Aryal, Sarita Nemani
Summary: This work explores variations of the conjecture proposed by Fernandez-Anaya and Martinez-Garcia (2004) regarding the robustness of stable transfer functions and associated polynomials. We identify certain classes of stable polynomials and perturbing functions in which the perturbations of the polynomials remain stable.
SYSTEMS & CONTROL LETTERS
(2022)
Article
Computer Science, Artificial Intelligence
Tino Werner
Summary: Contamination can distort estimators, but robustness can address this issue. However, there is little discussion on the relationship between contamination and distorted variable selection in the literature. Many methods for sparse model selection, such as Stability Selection, have been proposed. We introduce the variable selection breakdown point to measure the number of contaminated cases or cells required to detect no relevant variables. By combining the variable selection breakdown point with resampling, we quantify the robustness of Stability Selection. Our trimmed Stability Selection method aggregates only the models with the best performance, reducing the impact of heavily contaminated resamples.
Article
Chemistry, Analytical
Qingxia Yang, Yaguo Gong, Feng Zhu
Summary: Multiclass metabolomics is widely used in clinical practice for understanding disease progression and identifying diagnostic biomarkers. It is more challenging than the binary problem due to the complexity of determining class decision boundaries. However, there is still a lack of a systematic assessment for selecting appropriate methods in multiclass metabolomics.
ANALYTICAL CHEMISTRY
(2023)
Article
Automation & Control Systems
Victor Hamer, Pierre Dupont
Summary: Current feature selection methods, especially in high-dimensional data, may suffer from instability, but a new stability measure proposed in this work, which incorporates the importance of selected features in predictive models, has been shown to correct overly optimistic estimates and improve decision-making accuracy.
JOURNAL OF MACHINE LEARNING RESEARCH
(2021)
Article
Biochemical Research Methods
Fengcheng Li, Ying Zhou, Ying Zhang, Jiayi Yin, Yunqing Qiu, Jianqing Gao, Feng Zhu
Summary: Mass spectrometry-based proteomic technique is essential in studying biological processes. However, current statistical frameworks neglect the reproducibility among identified features. Thus, developing a tool to identify reproducible and generalizable proteomic signatures is crucial.
BRIEFINGS IN BIOINFORMATICS
(2022)
Article
Materials Science, Multidisciplinary
Rishi E. Kumar, Armi Tiihonen, Shijing Sun, David P. Fenning, Zhe Liu, Tonio Buonassisi
Summary: In this article, the practical challenges hindering the commercialization of halide perovskites are reviewed and the potential applications of machine learning in addressing these challenges are discussed. The authors propose that through the adaptation of machine learning tools in various areas, it is possible to stabilize manufacturing processes, narrow the performance gap between devices, and accelerate root-cause analysis.
Article
Computer Science, Artificial Intelligence
Utkarsh Mahadeo Khaire, R. Dhanalakshmi
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING
(2020)
Article
Computer Science, Hardware & Architecture
Utkarsh Mahadeo Khaire, R. Dhanalakshmi
Summary: The microarray dataset covers almost every gene in the genome and helps with cancer diagnosis, prognosis, and treatment. The curse of dimensionality in microarray data hinders useful information and leads to computational instability. Feature selection and the random forest algorithm play a crucial role in extracting important features and reducing data dimensionality.
Article
Engineering, Electrical & Electronic
Utkarsh Mahadeo Khaire, R. Dhanalakshmi
Summary: In this study, a new feature selection model based on improved WOA (iWOA) is proposed to select significant features from a high-dimensional microarray dataset. The stability of the results obtained is evaluated with the existing stability index that satisfies all the required characteristics of the stability measure.
IETE TECHNICAL REVIEW
(2022)
Article
Computer Science, Hardware & Architecture
K. Balakrishnan, R. Dhanalakshmi, Utkarsh Mahadeo Khaire
Summary: The study introduces an enhanced version of Salp Swarm Algorithm (iSSA) which improves exploratory capabilities by randomizing location updates and using Levy flights to converge the model towards global optima. Experimental results show that iSSA outperforms SSA in six high-dimensional datasets, providing higher confidence in feature selection results.
JOURNAL OF SUPERCOMPUTING
(2021)
Article
Computer Science, Software Engineering
Kulanthaivel Balakrishnan, Ramasamy Dhanalakshmi, Utkarsh Mahadeo Khaire
Summary: This research introduces the marine predators algorithm (MPA) and its improved version ROBL-MPA in handling high-dimensional datasets. ROBL-MPA outperforms traditional MPA in terms of performance.
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE
(2022)
Article
Computer Science, Artificial Intelligence
Kulanthaivel Balakrishnan, Ramasamy Dhanalakshmi, Utkarsh Khaire
Summary: By improving the marine predator algorithm with opposition-based learning, stable feature selection is achieved in high-dimensional datasets, leading to enhanced classification accuracy. The proposed OBL-based marine predator algorithm demonstrates superior converging capacity, classification accuracy, and stable feature selection compared to conventional feature selection techniques.
Article
Computer Science, Artificial Intelligence
K. Balakrishnan, R. Dhanalakshmi, Utkarsh Mahadeo Khaire
Summary: The massive growth in data size has led to a proliferation of the need for feature selection methods. This research proposes an enhanced Harris Hawks Optimization algorithm for feature selection, which utilizes Brownian motion and a novel control factor to improve the search process. Experimental results demonstrate the superiority of this algorithm over existing techniques.
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING
(2022)
Article
Computer Science, Artificial Intelligence
Utkarsh Mahadeo Khaire, R. Dhanalakshmi, K. Balakrishnan, M. Akila
Summary: This research proposes a hybrid combination of Opposition-Based Learning and Sailfish Optimization strategy to recognize salient features in high-dimensional datasets. The method improves exploration capability and convergence rate, achieving better classification accuracy compared to existing methods.
INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING
(2023)
Article
Engineering, Multidisciplinary
R. Dhanalakshmi, Utkarsh M. Khaire
JOURNAL OF SCIENTIFIC & INDUSTRIAL RESEARCH
(2019)