☆ 4.5 Article

The peaking phenomenon in the presence of feature-selection

PATTERN RECOGNITION LETTERS (2008)

期刊

PATTERN RECOGNITION LETTERS

卷 29, 期 11, 页码 1667-1674

出版社

ELSEVIER SCIENCE BV

DOI: 10.1016/j.patrec.2008.04.010

关键词

classification; feature-selection; peaking phenomenon

类别

Computer Science, Artificial Intelligence

向作者/读者索取更多资源

Protocol

Reagent

摘要

For a fixed sample size, a common phenomenon is that the error of a designed classifier decreases and then increases as the number of features grows. This peaking phenomenon has been recognized for forty years and depends on the classification rule and feature-label distribution. Historically, the peaking phenomenon has been treated by assuming a fixed Ordering of the features, usually beginning with the strongest individual feature and proceeding with features of decreasing individual classification capability. This does not take into account feature-selection, which is commonplace in high-dimensional and small sample settings. This paper revisits the peaking phenomenon in the presence of feature-selection. Using massive simulation in a high-performance computing environment, the paper considers various combinations of feature-label models, feature-selection algorithms, and classifier models to produce a large library of error versus feature size curves. Owing to the prevalence of feature-selection in genomic classification, we also consider gene-expression-based classification of breast-cancer patient prognosis. Results vary widely and are strongly dependent on the combination. The error curves tend to fall into three categories: peaking, settling into a plateau, or falling very slowly over a long range of feature set sizes. It can be concluded that one should be wary of applying peaking results found in the absence of feature-selection to settings in which feature-selection is employed. (c) 2008 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5

评分不足

次要评分

新颖性

-

重要性

-

科学严谨性

-

评价这篇论文

推荐

Article Mathematics

Redundancy Is Not Necessarily Detrimental in Classification Problems

Sebastian Alberto Grillo, Jose Luis Vazquez Noguera, Julio Cesar Mello Roman, Miguel Garcia-Torres, Jacques Facon, Diego P. Pinto-Roa, Luis Salgueiro Romero, Francisco Gomez-Vela, Laura Raquel Bareiro Paniagua, Deysi Natalia Leguizamon Correa

Summary: This study analyzes the impact of redundant features on classification model performance and proposes a theoretical framework for analyzing feature construction and selection. The experimental results suggest that a large number of redundant features can reduce the classification error.

MATHEMATICS (2021)

添加到收藏夹

Article Biochemical Research Methods

Multi-scale deep learning for the imbalanced multi-label protein subcellular localization prediction based on immunohistochemistry images

Fengsheng Wang, Leyi Wei

Summary: In this study, we propose a novel multi-scale end-to-end deep learning model, MSTLoc, for identifying protein subcellular locations in the imbalanced multi-label immunohistochemistry (IHC) images dataset. We demonstrate that the proposed MSTLoc outperforms current state-of-the-art models in multi-label subcellular location prediction. Through feature visualization and interpretation analysis, we show that the multi-scale deep features learned from our model exhibit better ability in capturing discriminative patterns underlying protein subcellular locations, and the features from different scales are complementary for the improvement in performance. Case study results indicate that our MSTLoc can successfully identify some biomarkers from proteins that are closely involved in cancer development.

BIOINFORMATICS (2022)

添加到收藏夹

Article Mathematics

Chaos Embed Marine Predator (CMPA) Algorithm for Feature Selection

Adel Fahad Alrasheedi, Khalid Abdulaziz Alnowibet, Akash Saxena, Karam M. Sallam, Ali Wagdy Mohamed

Summary: In this study, a chaos embed marine predator algorithm (CMPA) is proposed for feature selection in data mining applications. The comparative analysis and statistical significance tests provide evidence for the effectiveness and applicability of the proposed algorithm.

MATHEMATICS (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

A novel relation aware wrapper method for feature selection

Zhaogeng Liu, Jielong Yang, Li Wang, Yi Chang

Summary: In this paper, a novel wrapper feature selection method named ERASE is proposed, which learns and utilizes sample relations and feature relations for feature selection. Experimental results demonstrate that our method outperforms other feature selection methods in most cases.

PATTERN RECOGNITION (2023)

添加到收藏夹

Article Automation & Control Systems

Dynamic feature weighting for data streams with distribution-based log-likelihood divergence

Xiaokang Wang, Huiwen Wang, Dexiang Wu

Summary: This study proposes an online dynamic feature weighting algorithm to monitor feature drift in data streams. The algorithm detects changes in class relevance of features based on the log-likelihood divergence score, and it has been shown to improve the accuracy rates of Nearest Neighbor and Naive Bayes classifiers on both synthetic and real-world datasets.

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

A feature selection method via analysis of relevance, redundancy, and interaction

Lianxi Wang, Shengyi Jiang, Siyu Jiang

Summary: The study introduces a novel feature selection algorithm that selects relevant and interactive features using a maximum criterion, leading to improved classification accuracy. Experimental results show that the algorithm efficiently selects features and enhances classifiers to achieve better or comparable classification accuracy compared to ten representative competing feature selection algorithms.

EXPERT SYSTEMS WITH APPLICATIONS (2021)

添加到收藏夹

Article Clinical Neurology

Crowd-Sourced Deep Learning for Intracranial Hemorrhage Identification: Wisdom of Crowds or Laissez-Faire

E. I. S. Hofmeijer, C. O. Tan, F. van der Heijden, R. Gupta

Summary: Researchers tested ensemble learning for selecting the best artificial intelligence models for intracranial hemorrhage detection, but ensemble learning methods did not outperform the single best model.

AMERICAN JOURNAL OF NEURORADIOLOGY (2023)

添加到收藏夹

Article Computer Science, Information Systems

Biomarker to find neurodegenerative diseases using the structural changes in brain using computer vision

G. Wiselin Jiji

Summary: Algorithms in computer vision are crucial for extracting valuable hidden information from datasets. This study focuses on diagnosing neurodegenerative diseases such as Alzheimer's disease, Parkinson's disease, and bipolar disorder. It uses potential biomarkers extracted from T1 MRI and brain tissue volumes, specifically the 3D Speeded Up Robust Feature (SURF) and 3D Scale Invariant Feature Transform (SIFT) features. Random Forest and SVM approaches are employed to select key points for diagnosis, achieving a classification accuracy of 98.6%.

MULTIMEDIA TOOLS AND APPLICATIONS (2023)

添加到收藏夹

Article Computer Science, Information Systems

Two-phase fuzzy feature-filter based hybrid model for spam classification

Gazal, Kapil Juneja

Summary: This paper investigates a two-level filter-based hybrid model to accurately identify spam messages. The model selects the most important features through filtering and evaluation methods, and uses classifiers to generate probabilistic scores for spam detection. The experimental results show that the model achieves high accuracy on multiple datasets and outperforms traditional methods.

JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES (2022)

添加到收藏夹

Article Computer Science, Information Systems

A novel feature selection approach based on constrained eigenvalues optimization

Amina Benkessirat, Nadjia Benblidia

Summary: In real-life classification applications, it can be challenging to select model features that adequately classify samples from a large number of candidates. This article's main contributions include evaluating the relevance and redundancy of features, defining the feature selection problem as an eigenvalue computation problem with a linear constraint, and efficiently selecting the best features. The approach was tested on 20 UCI benchmark datasets and compared with other widely used and state-of-the-art approaches. The experimental results showed that our approach improved the classification task by using only 20% of the conventional features.

JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Class-specific feature selection via maximal dynamic correlation change and minimal redundancy

Xi-Ao Ma, Hao Xu, Chunhua Ju

Summary: This paper proposes a class-specific feature selection method based on information theory. A class-specific feature evaluation criterion called CSMDCCMR is developed, and a feature selection algorithm is designed to select a suitable feature subset for each class. Experimental results demonstrate the superiority of the proposed method compared to other methods.

EXPERT SYSTEMS WITH APPLICATIONS (2023)

添加到收藏夹

Article Chemistry, Physical

Exploring high thermal conductivity polymers via interpretable machine learning with physical descriptors

Xiang Huang, Shengluo Ma, C. Y. Zhao, Hong Wang, Shenghong Ju

Summary: This study proposes a high-throughput screening framework for designing polymer chains with high thermal conductivity using interpretable machine learning and physical feature engineering. By optimizing physical descriptors and assisting machine learning models, the framework achieves higher prediction accuracy compared to traditional methods. The study also analyzes the contributions of individual descriptors and derives an explicit prediction equation for thermal conductivity. Polymer chains with high thermal conductivity are predominantly pi-conjugated structures with strong intra-chain interactions, resulting in enhanced thermal transport.

NPJ COMPUTATIONAL MATERIALS (2023)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Classification of seven Iranian wheat varieties using texture features

Mostafa Khojastehnazhand, Mozaffar Roostaei

Summary: This study used a machine vision system and texture feature extraction methods to classify seven varieties of wheat in the East Azerbaijan Province of Iran. By utilizing unsupervised and supervised methods, along with feature extraction, the different wheat varieties were identified with over 95% accuracy.

EXPERT SYSTEMS WITH APPLICATIONS (2022)

添加到收藏夹

Article Computer Science, Information Systems

Time series classification with random temporal features

Cun Ji, Mingsen Du, Yanxuan Wei, Yupeng Hu, Shijun Liu, Li Pan, Xiangwei Zheng

Summary: Time series classification is widely used in various domains, including EEG/ECG classification, device anomaly detection, and speaker authentication. Despite the existence of many methods, selecting intuitive temporal features for accurate classification remains a challenge. Therefore, this paper proposes a new method called TSC-RTF, which utilizes random temporal features, and shows that it can compete with state-of-the-art methods.

JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES (2023)

添加到收藏夹

Article Engineering, Multidisciplinary

Relevance-diversity algorithm for feature selection and modified Bayes for prediction

M. Shaheen, N. Naheed, A. Ahsan

Summary: Big data analytics uncovers hidden patterns through classification, prediction and reinforcement of big datasets. Relevant, important and informative features are selected using different filtration techniques. A new feature selection technique called Relevance-diversity algorithm and a new supervised classification algorithm based on Naive Bayes classification are proposed. The performance of these techniques is evaluated using various datasets, and the results show improvements in terms of feature selection, accuracy, and time complexity.

ALEXANDRIA ENGINEERING JOURNAL (2023)

添加到收藏夹

暂无数据

暂无数据

© Peeref 2019-2024. All rights reserved.