4.7 Article

A Deep Neural Network Model using Random Forest to Extract Feature Representation for Gene Expression Data Classification

Journal

SCIENTIFIC REPORTS
Volume 8, Issue -, Pages -

Publisher

NATURE PORTFOLIO
DOI: 10.1038/s41598-018-34833-6

Keywords

-

Funding

  1. NIH [R01GM124061, R37AI051231]
  2. NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES [R01GM124061] Funding Source: NIH RePORTER

Ask authors/readers for more resources

In predictive model development, gene expression data is associated with the unique challenge that the number of samples (n) is much smaller than the amount of features (p). This n << p property has prevented classification of gene expression data from deep learning techniques, which have been proved powerful under n > p scenarios in other application fields, such as image classification. Further, the sparsity of effective features with unknown correlation structures in gene expression profiles brings more challenges for classification tasks. To tackle these problems, we propose a newly developed classifier named Forest Deep Neural Network (fDNN), to integrate the deep neural network architecture with a supervised forest feature detector. Using this built-in feature detector, the method is able to learn sparse feature representations and feed the representations into a neural network to mitigate the overfitting problem. Simulation experiments and real data analyses using two RNA-seq expression datasets are conducted to evaluate fDNN's capability. The method is demonstrated a useful addition to current predictive models with better classification performance and more meaningful selected features compared to ordinary random forests and deep neural networks.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Biochemical Research Methods

scBatch: batch-effect correction of RNA-seq data through sample distance matrix adjustment

Teng Fei, Tianwei Yu

BIOINFORMATICS (2020)

Review Biochemical Research Methods

Accurate feature selection improves single-cell RNA-seq cell clustering

Kenong Su, Tianwei Yu, Hao Wu

Summary: Cell clustering is a crucial task in single-cell RNA sequencing (scRNA-seq) data analysis, with feature selection playing a key role in improving clustering accuracy. The study evaluates the impact of feature selection on cell clustering accuracy and introduces a new algorithm named FEAture SelecTion (FEAST) for selecting more representative features. Applying FEAST to 12 public scRNA-seq datasets demonstrates a significant improvement in clustering accuracy when combined with existing clustering tools.

BRIEFINGS IN BIOINFORMATICS (2021)

Article Virology

Elevated levels of inflammatory plasma biomarkers are associated with risk of HIV infection

Samantha McInally, Kristin Wall, Tianwei Yu, Rabindra Tirouvanziam, William Kilembe, Jill Gilmour, Susan A. Allen, Eric Hunter

Summary: The study showed that individuals who later became HIV-1 infected had significantly higher baseline levels of multiple inflammatory cytokines/chemokines compared to individuals who remained HIV-negative. Specific levels of certain biomarkers were identified as significant predictors of later HIV acquisition, indicating a potential link between inflammation and immune activation with increased risk of HIV infection.

RETROVIROLOGY (2021)

Article Immunology

Immunologic mechanisms of seasonal influenza vaccination administered by microneedle patch from a randomized phase I trial

Nadine G. Rouphael, Lilin Lai, Sonia Tandon, Michele Paine McCullough, Yunchuan Kong, Sarah Kabbani, Muktha S. Natrajan, Yongxian Xu, Yerun Zhu, Dongli Wang, Jesse O'Shea, Amy Sherman, Tianwei Yu, Sebastien Henry, Devin McAllister, Daniel Stadlbauer, Surender Khurana, Hana Golding, Florian Krammer, Mark J. Mulligan, Mark R. Prausnitz

Summary: The study found that inactivated influenza virus vaccination through dissolvable microneedle patches (MNPs) produces humoral and cellular immune responses that are similar or greater than traditional intramuscular (IM) vaccination. MNPs induced higher neuraminidase inhibition (NAI) titers for all three influenza virus strains tested and stimulated a larger percentage of circulating T follicular helper cells.

NPJ VACCINES (2021)

Article Environmental Sciences

The exposome in practice: an exploratory panel study of biomarkers of air pollutant exposure in Chinese people aged 60-69 years (China BAPE Study)

Song Tang, Tiantian Li, Jianlong Fang, Renjie Chen, Yu'e Cha, Yanwen Wang, Mu Zhu, Yi Zhang, Yuanyuan Chen, Yanjun Du, Tianwei Yu, David C. Thompson, Krystal J. Godri Pollitt, Vasilis Vasiliou, John S. Ji, Haidong Kan, Junfeng Jim Zhang, Xiaoming Shi

Summary: The exposome is a novel research paradigm that comprehensively considers the complex interactions between exogenous exposures, endogenous exposures, and modifiable factors in humans. By exploring the association between individual airborne exposure and adverse health outcomes, utilizing advanced monitoring techniques and biological sample analysis, the exposome approach can reveal the mechanisms underlying the impact of environmental exposures on human health.

ENVIRONMENT INTERNATIONAL (2021)

Article Engineering, Environmental

The Oxidative Potential of Fine Particulate Matter and BiologicalPerturbations in Human Plasma and Saliva Metabolome

Ziyin Tang, Jeremy A. Sarnat, Rodney J. Weber, Armistead G. Russell, Xiaoyue Zhang, Zhenjiang Li, Tianwei Yu, Dean P. Jones, Donghai Liang

Summary: The study found that particulate oxidative potential may be a key parameter for particulate matter toxicity. By examining the biological changes and underlying molecular mechanisms associated with particulate oxidative potential, the study identified leukotriene metabolism and galactose metabolism in plasma, and vitamin E metabolism and leukotriene metabolism in saliva as top pathways associated with FPMOP. The study also observed different patterns of perturbed pathways for water-soluble and -insoluble FPMOP, and identified five metabolites directly associated with FPMOP. These findings suggest that FPMOP may be a more sensitive and health-relevant measure for understanding the causes related to PM2.5 exposures.

ENVIRONMENTAL SCIENCE & TECHNOLOGY (2022)

Article Mathematical & Computational Biology

Feature selection and classification over the network with missing node observations

Zhuxuan Jin, Jian Kang, Tianwei Yu

Summary: Jointly analyzing transcriptomic data and existing biological networks leads to more robust and informative feature selection results and a better understanding of biological mechanisms. A new Bayesian node classification framework is proposed to handle missing values and improve classification accuracy while reducing bias in estimating gene effects. This method outperforms existing approaches in comprehensive simulation studies and analysis of real-world genomic data.

STATISTICS IN MEDICINE (2022)

Article Biochemical Research Methods

AIME: Autoencoder-based integrative multi-omics data embedding that allows for confounder adjustments

Tianwei Yu

Summary: This study introduces a deep learning-based approach, AIME, for extracting data representation for integrative analysis of omics data. The method can adjust for confounding factors, achieve informative data embedding, and identify related feature pairs between two data types.

PLOS COMPUTATIONAL BIOLOGY (2022)

Article Biochemical Research Methods

Metapone: a Bioconductor package for joint pathway testing for untargeted metabolomics data

Leqi Tian, Zhenjiang Li, Guoxuan Ma, Xiaoyue Zhang, Ziyin Tang, Siheng Wang, Jian Kang, Donghai Liang, Tianwei Yu

Summary: This article introduces an innovative R/Bioconductor package for pathway enrichment testing of untargeted metabolomics data. The package addresses the matching uncertainty between data features and metabolites, and allows for the simultaneous analysis of positive and negative ion mode LC/MS data.

BIOINFORMATICS (2022)

Article Engineering, Environmental

? Evaluation of the Use of Saliva Metabolome as a Surrogate of Blood Metabolome in Assessing Internal Exposures to Traffic-Related Air Pollution

Zhenjiang Li, Jeremy A. Sarnat, Ken H. Liu, Robert B. Hood, Che-Jung Chang, Xin Hu, ViLinh Tran, Roby Greenwald, Howard H. Chang, Armistead Russell, Tianwei Yu, Dean P. Jones, Donghai Liang

Summary: Saliva may serve as an alternative biospecimen to blood in evaluating the association between traffic-related air pollution and biological responses.

ENVIRONMENTAL SCIENCE & TECHNOLOGY (2022)

Article Biology

MM-GANN-DDI: Multimodal Graph-Agnostic Neural Networks for Predicting Drug-Drug Interaction Events

Junning Feng, Yong Liang, Tianwei Yu

Summary: Personalized treatment of complex diseases relies on combined medication, but unexpected drug-drug interactions (DDIs) can lead to adverse effects. This study proposes a multimodal graph-agnostic neural network model for predicting drug-drug interaction events. The model demonstrates competitive performance on prediction tasks, particularly in predicting DDI types for new drugs, and outperforms existing methods in terms of accuracy, F1 score, precision, and recall.

COMPUTERS IN BIOLOGY AND MEDICINE (2023)

Article Neurosciences

Bayesian nonparametric method for genetic dissection of brain activation region

Zhuxuan Jin, Jian Kang, Tianwei Yu

Summary: This study proposes a Bayesian hierarchical model to investigate the shape and intensity of brain activation regions, and develops efficient posterior computation algorithms. The results demonstrate the significant application value of this model in Alzheimer's disease research.

FRONTIERS IN NEUROSCIENCE (2023)

Article Endocrinology & Metabolism

Sphinganine is associated with 24-h MAP in the non-sleepy with OSA

Victoria M. Pak, Katherine Russell, Zhenzhen Shi, Qiang Zhang, John Cox, Karan Uppal, Tianwei Yu, Vicki Hertzberg, Ken Liu, Octavian C. Ioachimescu, Nancy Collop, Donald L. Bliwise, Nancy G. Kutner, Ann Rogers, Sandra B. Dunbar

Summary: There is a difference in 24-hour MAP between sleepy and non-sleepy participants with newly diagnosed OSA, and sphinganine is significantly associated with MAP in non-sleepy patients with OSA.

METABOLOMICS (2022)

Article Multidisciplinary Sciences

A single-cell analysis of the molecular lineage of chordate embryogenesis

Tengjiao Zhang, Yichi Xu, Kaoru Imai, Teng Fei, Guilin Wang, Bo Dong, Tianwei Yu, Yutaka Satou, Weiyang Shi, Zhirong Bao

SCIENCE ADVANCES (2020)

Article Mathematics, Interdisciplinary Applications

Bayesian Network Marker Selection via the Thresholded Graph Laplacian Gaussian Prior

Qingpo Cai, Jian Kang, Tianwei Yu

BAYESIAN ANALYSIS (2020)

No Data Available