Unbiased Feature Selection in Learning Random Forests for High-Dimensional Data

Article Computer Science, Artificial Intelligence

Sparse kernel k-means for high-dimensional data

Xin Guan, Yoshikazu Terada

Summary: In this paper, a novel sparse kernel k-means clustering method is proposed to address the issue of clustering high-dimensional data. By optimizing the feature indicators, the clustering performance is improved.

PATTERN RECOGNITION (2023)

Add to Collection

Article Computer Science, Artificial Intelligence

Cost-sensitive learning strategies for high-dimensional and imbalanced data: a comparative study

Barbara Pes, Giuseppina Lai

Summary: This study investigates how to address high dimensionality and class imbalance simultaneously, comparing the use of feature selection and cost-sensitive learning methods in challenging genomic benchmark datasets, showing interesting insights into the beneficial impact of combining these approaches.

PEERJ COMPUTER SCIENCE (2021)

Add to Collection

Article Environmental Sciences

High-Resolution Mangrove Forests Classification with Machine Learning Using Worldview and UAV Hyperspectral Data

Yufeng Jiang, Li Zhang, Min Yan, Jianguo Qi, Tianmeng Fu, Shunxiang Fan, Bowei Chen

Summary: The study compared the performance of different image data in classifying mangrove species, showing that the accuracy of combined data was higher, with vegetation index features from UAV hyperspectral data and texture index from WV-2 data playing dominant roles. The RF algorithm had an overall accuracy of 95.89%.

REMOTE SENSING (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

An interactive feature selection method based on multi-step state transition algorithm for high-dimensional data

Yangyi Du, Xiaojun Zhou, Chunhua Yang, Tingwen Huang

Summary: In this paper, an interactive feature selection framework based on state transition algorithm is proposed to address high-dimensional feature selection problems. The framework combines the advantages of filter and wrapper methods to improve classification efficiency, and employs self-adaptive mechanism and multi-step STA to avoid local optima.

KNOWLEDGE-BASED SYSTEMS (2023)

Add to Collection

Article Computer Science, Artificial Intelligence

Ensemble learning-based filter-centric hybrid feature selection framework for high-dimensional imbalanced data

Jongmo Kim, Jaewoong Kang, Mye Sohn

Summary: Research on feature selection for high-dimensional imbalanced data has been a focus of attention. A hybrid method that combines filter and ensemble learning is proposed to select the best feature subset.

KNOWLEDGE-BASED SYSTEMS (2021)

Add to Collection

Article Computer Science, Information Systems

A joint multiobjective optimization of feature selection and classifier design for high-dimensional data classification

Lixia Bai, Hong Li, Weifeng Gao, Jin Xie, Houqiang Wang

Summary: Feature selection has been extensively studied in data mining and machine learning. Meta-heuristic algorithms are commonly used to solve feature selection problems, however, they suffer from issues such as large search space and long computation time. This article proposes a joint multiobjective optimization method, called JMO-FSCD, for feature selection and classifier design. The proposed approach uses a neural network as a classifier and introduces a non-iterative algorithm for training the classifier. Experimental results demonstrate the superior performance of JMO-FSCD compared to six state-of-the-art feature selection algorithms.

INFORMATION SCIENCES (2023)

Add to Collection

Article Computer Science, Information Systems

Unsupervised spectral feature selection algorithms for high dimensional data

Mingzhao Wang, Henry Han, Zhao Huang, Juanying Xie

Summary: It is proposed in this paper to detect the informative features for high dimensional data with a small number of samples through two unsupervised spectral feature selection algorithms. These algorithms group features using an advanced Self-Tuning spectral clustering algorithm and detect the global optimal feature clusters through feature ranking techniques. Extensive experiments demonstrate the effectiveness of the proposed algorithms, especially the one based on cosine similarity feature ranking technique. The detected features have strong discriminative capabilities, making them suitable for building reliable and explainable AI systems, particularly in medical diagnostic systems.

FRONTIERS OF COMPUTER SCIENCE (2023)

Add to Collection

Article Computer Science, Information Systems

Differential Privacy High-Dimensional Data Publishing Based on Feature Selection and Clustering

Zhiguang Chu, Jingsha He, Xiaolei Zhang, Xing Zhang, Nafei Zhu

Summary: As a social information product, the privacy and usability of high-dimensional data are the core issues in privacy protection. Feature selection is a commonly used technique for dimensionality reduction. However, some methods neglect the information associated with selected features, leading to low usability of the final results. This paper proposes a hybrid method based on feature selection and cluster analysis to address data utility and privacy problems. The method consists of three stages: feature screening, feature clustering analysis, and adaptive noise. Experimental results using the WDBC database demonstrate the effectiveness of the proposed method in preserving sensitive data information while retaining contribution to diagnostic results.

ELECTRONICS (2023)

Add to Collection

Article Computer Science, Artificial Intelligence

An efficient feature selection framework based on information theory for high dimensional data

G. Manikandan, S. Abirami

Summary: Feature selection is essential in pattern recognition and bioinformatics, as high-dimensional datasets often contain redundant and irrelevant features. The proposed MIMCFS technique effectively selects important features and eliminates redundancies through two stages. Experimental results show superior performance compared to existing methods.

APPLIED SOFT COMPUTING (2021)

Add to Collection

Review Computer Science, Artificial Intelligence

Feature selection for online streaming high-dimensional data: A state-of-the-art review

Ezzatul Akmal Kamaru Zaman, Azlinah Mohamed, Azlin Ahmad

Summary: This article provides a state-of-the-art review of feature subset selection for high-dimensional data in online streaming. It discusses traditional feature selection and online feature selection, and categorizes the challenges related to online feature selection. Several data forms are identified and evaluation metrics for online feature selection methods are compared. An online feature selection framework is derived to illustrate the relationship between application area, data form, methods, metrics, and tools. The findings and potential directions for future research are thoroughly discussed.

APPLIED SOFT COMPUTING (2022)

Add to Collection

Article Multidisciplinary Sciences

Bird's Eye View feature selection for high-dimensional data

Samir Brahim Belhaouari, Mohammed Bilal Shakeel, Aiman Erbad, Zarina Oflaz, Khelil Kassoul

Summary: This paper introduces a strategy based on Bird's Eye View (BEV) feature selection technique, which combines evolutionary algorithms, genetic algorithms, dynamic Markov chain, and reinforcement learning to improve performance and reduce the number of features in machine learning models.

SCIENTIFIC REPORTS (2023)

Add to Collection

Article Computer Science, Artificial Intelligence

Adaptive Subspace Optimization Ensemble Method for High-Dimensional Imbalanced Data Classification

Yuhong Xu, Zhiwen Yu, C. L. Philip Chen, Zhulin Liu

Summary: In this study, an adaptive subspace optimization ensemble method is proposed for high-dimensional imbalanced data classification. Multiple robust and discriminative subspaces are generated by adaptive subspace generation and rotated subspace optimization, and a resampling scheme is applied to construct class-balanced data. Experimental results demonstrate the superiority of this method over other imbalance learning approaches and classifier ensemble methods.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2023)

Add to Collection

Article Automation & Control Systems

A hybrid feature selection scheme for high-dimensional data

Mohammad Ahmadi Ganjei, Reza Boostani

Summary: In this paper, a new hybrid feature selection approach that combines filter and wrapper methods is proposed. By ranking, clustering, and searching the features, this method achieves better performance on high-dimensional datasets.

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE (2022)

Add to Collection

Article Physics, Multidisciplinary

Far from Asymptopia: Unbiased High-Dimensional Inference Cannot Assume Unlimited Data

Michael C. Abbott, Benjamin B. Machta

Summary: Inference from limited data requires a measure on parameter space, which is most explicit in the Bayesian framework as a prior distribution. However, the well-known Jeffreys prior leads to significant bias in high-dimensional models because the effective dimensionality of models in science is usually smaller than the number of microscopic parameters. A principled choice of measure that focuses on relevant parameters can avoid this issue and lead to unbiased posteriors. This optimal prior depends on the quantity of data and approaches Jeffreys prior in the asymptotic limit, but justifying this limit requires an impractically large increase in data quantity for typical models.

ENTROPY (2023)

Add to Collection

Article Computer Science, Artificial Intelligence

Feature selection in high dimensional data: A specific preordonnances-based memetic algorithm

Hasna Chamlal, Tayeb Ouaderhman, Basma El Mourtji

Summary: This study presents an algorithm for heterogeneous variable selection in discrimination problems. The algorithm utilizes both filter and wrapper approaches, and introduces a new feature discrimination power measure. Experimental results demonstrate the superiority of this algorithm over other methods.

KNOWLEDGE-BASED SYSTEMS (2023)

Add to Collection

Unbiased Feature Selection in Learning Random Forests for High-Dimensional Data

Journal

Scientific World Journal

Publisher

Hindawi Limited

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Sparse kernel k-means for high-dimensional data

Cost-sensitive learning strategies for high-dimensional and imbalanced data: a comparative study

High-Resolution Mangrove Forests Classification with Machine Learning Using Worldview and UAV Hyperspectral Data

An interactive feature selection method based on multi-step state transition algorithm for high-dimensional data

Ensemble learning-based filter-centric hybrid feature selection framework for high-dimensional imbalanced data

A joint multiobjective optimization of feature selection and classifier design for high-dimensional data classification

Unsupervised spectral feature selection algorithms for high dimensional data

Differential Privacy High-Dimensional Data Publishing Based on Feature Selection and Clustering

An efficient feature selection framework based on information theory for high dimensional data

Feature selection for online streaming high-dimensional data: A state-of-the-art review

Bird's Eye View feature selection for high-dimensional data

Adaptive Subspace Optimization Ensemble Method for High-Dimensional Imbalanced Data Classification

A hybrid feature selection scheme for high-dimensional data

Far from Asymptopia: Unbiased High-Dimensional Inference Cannot Assume Unlimited Data

Feature selection in high dimensional data: A specific preordonnances-based memetic algorithm

Unbiased Feature Selection in Learning Random Forests for High-Dimensional Data

Journal

Publisher

Hindawi Limited

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper