☆ 4.3 Article

A Two-Stage Feature Selection Method for Gene Expression Data

OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY (2009)

Journal

OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY

Volume 13, Issue 2, Pages 127-137

Publisher

MARY ANN LIEBERT, INC

DOI: 10.1089/omi.2008.0083

Keywords

-

Categories

Biotechnology & Applied Microbiology Genetics & Heredity

Funding

National Science Council in Taiwan [NSC96-2622-E-151-019-CC3, NSC96-2622-E214-004-CC3, NSC95-2221-E-151-004-MY3, NSC952221-E-214-087, NSC95-2622-E-214-004, NSC94-2622-E-151025-CC3, NSC94-2622-E-151-025-CC3, KMU-EM-97-2.1a.]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Microarray data referencing gene expression profiles provide valuable answers to a variety of problems, and contributes to advances in clinical medicine. Gene expression data typically has a high dimension and a small sample size. Generally, only relatively small numbers of gene expression data are strongly correlated with a certain phenotype. To analyze gene expression profiles correctly, feature (gene) selection is crucial for classification. Feature (gene) selection has certain advantages, such as effective extraction of genes that influence classification accuracy, elimination of irrelevant genes, and improvement of the classification accuracy calculation. In this paper, we propose a two-stage feature selection method, which uses information gain to implement a gene-ranking process, and combines an improved particle swarm optimization with the K-nearest neighbor method and support vector machine classifiers to calculate the classification accuracy. The experimental results show that the proposed method can effectively select relevant gene subsets, and achieves higher classification accuracy than previous studies.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.3

Not enough ratings

Secondary Ratings

Novelty

-

Significance

-

Scientific rigor

-

Rate this paper

Recommended

Article Computer Science, Artificial Intelligence

A two-phase gene selection method using anomaly detection and genetic algorithm for microarray data

Motahare Akhavan, Seyed Mohammad Hossein Hasheminejad

Summary: A new two-phase gene selection method for microarray data is proposed in this study. This method reduces the number of genes significantly and improves the classification accuracy through anomaly detection and guided genetic algorithm.

KNOWLEDGE-BASED SYSTEMS (2023)

Add to Collection

Article Environmental Sciences

Feature Selection for SAR Target Discrimination and Efficient Two-Stage Detection Method

Nam-Hoon Jeong, Jae-Ho Choi, Geon Lee, Ji-Hoon Park, Kyung-Tae Kim

Summary: In this study, a two-stage detection framework is proposed for feature-based target detection in synthetic aperture radar (SAR) images. The framework ensures efficient and superior detection performance in TerraSAR-X (TSX) images by using previously studied features. The first stage eliminates misdetections using simple features, and the second stage evaluates the discrimination performance of each feature and selects the suitable features for the image. The proposed method also incorporates the Karhunen-Loeve (KL) transform to reduce redundancy and maximize discrimination performance.

REMOTE SENSING (2022)

Add to Collection

Article Mathematical & Computational Biology

Two-stage feature selection for classification of gene expression data based on an improved Salp Swarm Algorithm

Xiwen Qin, Shuang Zhang, Dongmei Yin, Dongxue Chen, Xiaogang Dong

Summary: Microarray technology has produced a large amount of high-dimensional gene expression data. This paper proposes a two-stage feature selection framework that effectively solves the feature selection problem in small sample high-dimensional data, achieving high accuracy on multiple datasets.

MATHEMATICAL BIOSCIENCES AND ENGINEERING (2022)

Add to Collection

Article Construction & Building Technology

CorrDQN-FS: A two-stage feature selection method for energy consumption prediction via deep reinforcement learning

Lu Liu, Qiming Fu, You Lu, Yunzhe Wang, Hongjie Wu, Jianping Chen

Summary: This study proposes a novel feature selection method called CorrDQN-FS for optimizing energy consumption prediction. The method ranks features using Pearson correlation coefficient and utilizes deep reinforcement learning techniques to optimize the feature selection process. Experimental results demonstrate that CorrDQN-FS outperforms other methods in energy consumption prediction accuracy.

JOURNAL OF BUILDING ENGINEERING (2023)

Add to Collection

Article Genetics & Heredity

A Novel XGBoost Method to Infer the Primary Lesion of 20 Solid Tumor Types From Gene Expression Data

Sijie Chen, Wenjing Zhou, Jinghui Tu, Jian Li, Bo Wang, Xiaofei Mo, Geng Tian, Kebo Lv, Zhijian Huang

Summary: The study aimed to establish a machine learning model to identify primary lesions for primary metastatic tumors in an integrated learning approach, aiming to improve diagnostic efficiency. The results showed that combining tumor data with machine learning methods can predict the location of primary metastatic tumors accurately.

FRONTIERS IN GENETICS (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

Two-stage stacking heterogeneous ensemble learning method for gasoline octane number loss prediction

Shaoze Cui, Huaxin Qiu, Sutong Wang, Yanzhang Wang

Summary: The study proposes a method for predicting RON loss in gasoline refining process, including feature selection and stacking heterogeneous ensemble model. Experimental results show that the method is more accurate than other machine learning methods and can promote the development of the gasoline refining industry.

APPLIED SOFT COMPUTING (2021)

Add to Collection

Article Biology

Ensemble feature selection for stable biomarker identification and cancer classification from microarray expression data

Aiguo Wang, Huancheng Liu, Jing Yang, Guilin Chen

Summary: In this study, an ensemble feature selection framework is proposed to improve the discrimination and stability of features. By using sampling and aggregation strategies, accurate feature selection is achieved in small sample and high dimensionality scenarios, leading to improved diagnostic accuracy and understanding of disease mechanisms.

COMPUTERS IN BIOLOGY AND MEDICINE (2022)

Add to Collection

Review Computer Science, Artificial Intelligence

Feature selection methods in microarray gene expression data: a systematic mapping study

Mahnaz Vahmiyan, Mohammadtaghi Kheirabadi, Ebrahim Akbari

Summary: Feature selection is crucial in medicine and genetics research, especially in high-dimensional data. This paper provides a systematic survey of studies on FS techniques in microarrays, with results highlighting the importance of classification and the wide application of evolutionary methods in FS.

NEURAL COMPUTING & APPLICATIONS (2022)

Add to Collection

Article Computer Science, Information Systems

CGUFS: A clustering-guided unsupervised feature selection algorithm for gene expression data

Zhaozhao Xu, Fangyuan Yang, Hong Wang, Junding Sun, Hengde Zhu, Shuihua Wang, Yudong Zhang

Summary: This paper proposes a clustering-guided unsupervised feature selection algorithm for gene expression data, which addresses the problems of existing algorithms such as the need for artificially specifying the number of clusters, failure to consider feature redundancy, and inability to filter redundant features. The proposed algorithm introduces adaptive k-value strategy, feature grouping strategy, and adaptive filtering strategy to select significant features related to diseases. Experimental results demonstrate that the algorithm outperforms existing algorithms in terms of accuracy and correlation indexes.

JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES (2023)

Add to Collection

Article Management

An optimization method for characterizing two groups of data

Amir Salehipour

Summary: This paper proposes an optimization model for feature selection in the presence of two groups of data, and solves the problem using lexicographic method and matheuristic algorithms. Experimental results show that the proposed algorithms can deliver high-quality solutions within a reasonable amount of time.

INTERNATIONAL TRANSACTIONS IN OPERATIONAL RESEARCH (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

TFSFB: Two-stage feature selection via fusing fuzzy multi-neighborhood rough set with binary whale optimization for imbalanced data

Lin Sun, Shanshan Si, Weiping Ding, Xinya Wang, Jiucheng Xu

Summary: This study proposes a new feature subset selection scheme to deal with imbalanced data by fusing fuzzy multi-neighborhood rough set (FMRS) and binary whale optimization algorithm (BWOA). The method evaluates the distribution of different features using the standard deviation coefficient and constructs a fuzzy multi-neighborhood radius set. It also introduces fuzzy multi-neighborhood granule and fuzzy mem-bership degree to establish FMRS, and develops a feature significance measure to balance the properties and influences of different features. Experimental results demonstrate the effectiveness of the proposed algorithm for classification of imbalanced data.

INFORMATION FUSION (2023)

Add to Collection

Review Computer Science, Artificial Intelligence

Gene reduction and machine learning algorithms for cancer classification based on microarray gene expression data: A comprehensive review

Sarah Osama, Hassan Shaban, Abdelmgeid A. Ali

Summary: This review explores the applications of machine learning-based data reduction and classification algorithms in microarray gene expression data. It summarizes various data preprocessing methods, reviews different feature selection algorithms, and discusses feature extraction and hybrid methods. It also examines widely used machine learning algorithms for tumor and nontumor classification. Finally, the challenges and unanswered questions in accurate cancer classification and detection are highlighted.

EXPERT SYSTEMS WITH APPLICATIONS (2023)

Add to Collection

Article Multidisciplinary Sciences

Bi-dimensional principal gene feature selection from big gene expression data

Xiaoqian Hou, Jingyu Hou, Guangyan Huang

Summary: This paper proposes a novel method for efficiently extracting critical genes from large gene expression data by applying principal component analysis. Experimental results demonstrate that the method reduces data size, achieves faster processing speed, and maintains better accuracy and effectiveness.

PLOS ONE (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

A two-stage hybrid ant colony optimization for high-dimensional feature selection

Wenping Ma, Xiaobo Zhou, Hao Zhu, Longwei Li, Licheng Jiao

Summary: The paper introduces a two-stage hybrid ACO algorithm for high-dimensional feature selection, which is capable of handling large-scale datasets efficiently with shorter running time.

PATTERN RECOGNITION (2021)

Add to Collection

Article Biochemical Research Methods

A Construction Method of Dynamic Protein Interaction Networks by Using Relevant Features of Gene Expression Data

Jing Sun, Li Pan, Bin Li, Haoyue Wang, Bo Yang, Wenbin Li

Summary: This study proposes a method for constructing protein-protein interaction networks by selecting relevant features instead of continuous and periodic features to improve the accuracy of identifying essential proteins.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2023)

Add to Collection

Article Nutrition & Dietetics

Associations between Circulating Markers of Cholesterol Homeostasis and Macrovascular Events among Patients Undergoing Hemodialysis

Wen-Chin Lee, Wei-Hung Kuo, Sin-Hua Moi, Barry Chiu, Jin-Bor Chen, Cheng-Hong Yang

Summary: This study aimed to investigate the differences in cholesterol synthesis and absorption between hemodialysis patients and healthy controls. Results showed that markers for cholesterol homeostasis were not significantly associated with macrovascular events during a 1-year follow-up, shedding light on potential novel therapeutic targets in managing cholesterol absorption in hemodialysis patients.

NUTRIENTS (2021)

Add to Collection

Article Engineering, Civil

Deep Learning for Imputation and Forecasting Tidal Level

Cheng-Hong Yang, Chih-Hsien Wu, Chih-Min Hsieh, Yi-Chuan Wang, I-Fan Tsen, Shih-Huan Tseng

Summary: Tidal observations can be influenced by mechanical failures or typhoon-induced storms, leading to data interruptions or anomalies, reducing data applicability. A deep learning algorithm for missing value imputation and tide level forecasting has been proposed, with experimental results showing better performance compared to traditional methods.

IEEE JOURNAL OF OCEANIC ENGINEERING (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

Applications of Deep Learning and Fuzzy Systems to Detect Cancer Mortality in Next-Generation Genomic Data

Cheng-Hong Yang, Sin-Hua Moi, Ming-Feng Hou, Li-Yeh Chuang, Yu-Da Lin

Summary: In this study, a method named FuzzyDeepCoxPH that combines machine learning and traditional survival analysis was proposed to identify high-risk missense mutation variants and candidate genes associated with cancer mortality. The results showed that FuzzyDeepCoxPH can effectively distinguish high-risk variants and candidate genes related to cancer mortality, providing comprehensive risk estimation for cancer medicine.

IEEE TRANSACTIONS ON FUZZY SYSTEMS (2021)

Add to Collection

Article Engineering, Multidisciplinary

Flexible Resource Scheduling for Software-Defined Cloud Manufacturing with Edge Computing

Chen Yang, Fangyin Liao, Shulin Lan, Lihui Wang, Weiming Shen, George Q. Huang

Summary: This research focuses on achieving rapid reconfiguration in a cloud manufacturing environment by proposing a new manufacturing model called software-defined cloud manufacturing (SDCM), which transfers control logic from hardware to software. Edge computing is introduced to complement cloud computing with computation and storage capabilities near end devices. The study also addresses the management of network congestion caused by transmitting a large amount of Internet of Things (IoT) data with different quality of service (QoS) values. An approach integrating genetic algorithm, Dijkstra's shortest path algorithm, and queuing algorithm is proposed to solve the optimization problem. Experimental results demonstrate that the proposed method effectively prevents network congestion and reduces communication latency in the SDCM.

ENGINEERING (2023)

Add to Collection

Article Food Science & Technology

Analyzing the Performance of Machine Learning Techniques in Disease Prediction

Khongdet Phasinam, Tamal Mondal, Dony Novaliendry, Cheng-Hong Yang, Chiranjit Dutta, Mohammad Shabaz

Summary: The history of data stored can help companies predict potential patterns and make competitive decisions. This study focuses on the diagnosis and estimation of heart disease, and previous research has shown the effectiveness of knowledge exploration methods in predicting heart disease. Currently, there are no real-time methods for analyzing and forecasting heart disease in its early stages.

JOURNAL OF FOOD QUALITY (2022)

Add to Collection

Article Mathematics

Identifying the Association of Time-Averaged Serum Albumin Levels with Clinical Factors among Patients on Hemodialysis Using Whale Optimization Algorithm

Cheng-Hong Yang, Yin-Syuan Chen, Sin-Hua Moi, Jin-Bor Chen, Li-Yeh Chuang

Summary: This study employed a whale optimization algorithm-based feature selection model to interpret the complex association between time-averaged serum albumin (TSA) and clinical factors among hemodialysis patients. By conducting a multifactor analysis, an optimal multifactor TSA-associated model was constructed, which exhibited superior performance.

MATHEMATICS (2022)

Add to Collection

Article Biochemical Research Methods

DeepBarcoding: Deep Learning for Species Classification Using DNA Barcoding

Cheng-Hong Yang, Kuo-Chuan Wu, Li-Yeh Chuang, Hsueh-Wei Chang

Summary: DNA barcodes are short sequence fragments used for species identification. This study proposes a deep learning framework, called deep barcoding, for species classification using DNA barcodes. By utilizing raw sequence data and deep convolutional neural networks, the deep barcoding model achieves high accuracy in species identification. Although there are challenges, the deep barcoding model has the potential to be an effective tool for species classification.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2022)

Add to Collection

Article Mathematics

Deep Learning for Vessel Trajectory Prediction Using Clustered AIS Data

Cheng-Hong Yang, Guan-Cheng Lin, Chih-Hsien Wu, Yen-Hsien Liu, Yi-Chuan Wang, Kuo-Chang Chen

Summary: Accurate vessel track prediction is crucial for maritime traffic control and management to improve navigation efficiency and safety. This study proposed a DLSTM model for vessel prediction, which combines clustering and training techniques. The results demonstrated that the DLSTM model outperformed other models in terms of prediction accuracy.

MATHEMATICS (2022)

Add to Collection

Article Biochemical Research Methods

Dimensionality reduction approach for many -objective epistasis analysis

Cheng-Hong Yang, Ming-Feng Hou, Li-Yeh Chuang, Cheng-San Yang, Yu-Da Lin

Summary: This study extended MOMDR to the many-objective version (MaODR) for better identification of SSI between cases and controls. The MaODR-CLN model, with three objective functions - correct classification rate, likelihood ratio, and normalized mutual information, showed higher detection success rates compared to MOMDR and MDR. MaODR-CLN successfully identified significant SSIs associated with coronary artery disease.

BRIEFINGS IN BIOINFORMATICS (2023)

Add to Collection

Article Biology

Overall mortality risk analysis for rectal cancer using deep learning-based fuzzy systems

Cheng-Hong Yang, Wen -Ching Chen, Jin-Bor Chen, Hsiu-Chen Huang, Li-Yeh Chuang

Summary: This study proposed an advanced analytic approach, called Fuzzy-based RNNCoxPH, for detecting missense variants associated with high-risk of all-cause mortality in rectum adenocarcinoma. The Fuzzy-based RNNCoxPH model exhibits higher efficacy in identifying and classifying the missense variants related to mortality risk in rectum adenocarcinoma compared to other test methods.

COMPUTERS IN BIOLOGY AND MEDICINE (2023)

Add to Collection

Article Computer Science, Artificial Intelligence

Export- and import-based economic models for predicting global trade using deep learning

Cheng-Hong Yang, Cheng-Feng Lee, Po-Yin Chang

Summary: Forecasting global foreign trade is crucial for governments and multinational corporations, but accurate predictions are challenging due to complex relationships between exports, imports, and economic variables. Traditional models provide less accurate forecasts for trade data. This study proposes an ensemble learning approach that combines trade and deep learning models to improve forecasting performance. The method establishes cointegration relationships between variables and uses them to predict future trade data. Experimental results show that the ensemble learning method outperforms traditional models in terms of forecasting accuracy.

EXPERT SYSTEMS WITH APPLICATIONS (2023)

Add to Collection

Article Biochemical Research Methods

Dimensionality reduction approach for many-objective epistasis analysis

Cheng-Hong Yang, Ming-Feng Hou, Li-Yeh Chuang, Cheng-San Yang, Yu-Da Lin

Summary: This study extended the multiobjective approach-based multifactor dimensionality reduction (MOMDR) to the many-objective version (MaODR) to improve the identification of single-nucleotide polymorphism-single-nucleotide polymorphism interactions (SSIs) between cases and controls. An objective function selection approach was introduced to determine the optimal measure combination in MaODR among 10 well-known measures. The results showed that the MaODR-CLN model exhibited higher detection success rates in identifying SSIs with weak marginal effects.

BRIEFINGS IN BIOINFORMATICS (2022)

Add to Collection

Article Pharmacology & Pharmacy

Machine Learning approaches for the mortality risk assessment of patients undergoing hemodialysis

Cheng-Hong Yang, Yin-Syuan Chen, Sin-Hua Moi, Jin-Bor Chen, Lin Wang, Li-Yeh Chuang

Summary: This study aimed to assess the all-cause mortality risk in hemodialysis (HD) patients and compared the performance of different Cox proportional hazards (CoxPH) models. The whale optimization algorithm (WOA)-CoxPH model showed the highest concordance index and provided better risk assessment compared to other models. Patients with seven or more risk characteristics of eight selected parameters were found to have a potentially increased risk of all-cause mortality in the HD population.

THERAPEUTIC ADVANCES IN CHRONIC DISEASE (2022)

Add to Collection

Article Computer Science, Information Systems

AIS-Based Intelligent Vessel Trajectory Prediction Using Bi-LSTM

Cheng-Hong Yang, Chih-Hsien Wu, Jen-Chung Shao, Yi-Chuan Wang, Chih-Min Hsieh

Summary: Accurate vessel trajectory prediction is crucial for maritime traffic control and management, aiding in route planning, distance reduction, and increased efficiency. This study proposes a method that combines data denoising and deep learning prediction to improve accuracy. Experimental results demonstrate the effectiveness of the proposed method.

IEEE ACCESS (2022)

Add to Collection

Article Pharmacology & Pharmacy

Identification of mortality-risk-related missense variant for renal clear cell carcinoma using deep learning

Jin-Bor Chen, Huai-Shuo Yang, Sin-Hua Moi, Li-Yeh Chuang, Cheng-Hong Yang

Summary: The improved DeepSurv model achieved greater balanced accuracy compared with the DeepSurv model and identified 610 high-risk variants associated with overall mortality. The results of gene differential expression analysis indicated nine KIRCC mortality-risk-related pathways, suggesting their associations with cancer cell growth, cancer cell differentiation, and immune response inhibition. The findings support the effectiveness of the improved DeepSurv model in identifying mortality-related high-risk variants and candidate genes in the context of KIRCC overall mortality.

THERAPEUTIC ADVANCES IN CHRONIC DISEASE (2021)

Add to Collection

No Data Available

© Peeref 2019-2024. All rights reserved.