4.6 Article

Prediction of Protein Domain with mRMR Feature Selection and Analysis

期刊

PLOS ONE
卷 7, 期 6, 页码 -

出版社

PUBLIC LIBRARY SCIENCE
DOI: 10.1371/journal.pone.0039308

关键词

-

资金

  1. National Basic Research Program of China [2011CB510102, 2011CB510101]
  2. Shanghai Municipal Education Commission [12ZZ087]

向作者/读者索取更多资源

The domains are the structural and functional units of proteins. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to develop effective methods for predicting the protein domains according to the sequences information alone, so as to facilitate the structure prediction of proteins and speed up their functional annotation. However, although many efforts have been made in this regard, prediction of protein domains from the sequence information still remains a challenging and elusive problem. Here, a new method was developed by combing the techniques of RF (random forest), mRMR (maximum relevance minimum redundancy), and IFS (incremental feature selection), as well as by incorporating the features of physicochemical and biochemical properties, sequence conservation, residual disorder, secondary structure, and solvent accessibility. The overall success rate achieved by the new method on an independent dataset was around 73%, which was about 28-40% higher than those by the existing method on the same benchmark dataset. Furthermore, it was revealed by an in-depth analysis that the features of evolution, codon diversity, electrostatic charge, and disorder played more important roles than the others in predicting protein domains, quite consistent with experimental observations. It is anticipated that the new method may become a high-throughput tool in annotating protein domains, or may, at the very least, play a complementary role to the existing domain prediction methods, and that the findings about the key features with high impacts to the domain prediction might provide useful insights or clues for further experimental investigations in this area. Finally, it has not escaped our notice that the current approach can also be utilized to study protein signal peptides, B-cell epitopes, HIV protease cleavage sites, among many other important topics in protein science and biomedicine.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Biology

Identification of Cell Markers and Their Expression Patterns in Skin Based on Single-Cell RNA-Sequencing Profiles

Xianchao Zhou, Shijian Ding, Deling Wang, Lei Chen, Kaiyan Feng, Tao Huang, Zhandong Li, Yudong Cai

Summary: A computational pipeline was developed to investigate the pathological mechanisms of skin diseases and identify potential therapeutic and diagnostic targets.

LIFE-BASEL (2022)

Article Biotechnology & Applied Microbiology

Analysis of Lymphoma-Related Genes with Gene Ontology and Kyoto Encyclopedia of Genes and Genomes Enrichment

Qiao Sun, Lin Bai, Shaopin Zhu, Lu Cheng, Yang Xu, Yu-Dong Cai, Hui Chen, Jian Zhang

Summary: This study utilized gene ontology and KEGG pathway analyses to identify lymphoma-associated genes and determine their biological processes. Features were selected and ranked using various methods, and a decision tree model was used to extract classification rules. The predicted features were consistent with recent publications and provide a new perspective for understanding the molecular mechanisms of lymphoma.

BIOMED RESEARCH INTERNATIONAL (2022)

Article Biotechnology & Applied Microbiology

Identification of Human Cell Cycle Phase Markers Based on Single-Cell RNA-Seq Data by Using Machine Learning Methods

FeiMing Huang, Lei Chen, Wei Guo, Tao Huang, Yu-dong Cai

Summary: The study constructs efficient classifiers based on single-cell RNA sequencing data and identifies essential gene biomarkers, while also mining a series of classification rules that can distinguish different cell cycle phases, providing a novel method for determining the cell cycle and identifying new potential cell cycle-related genes.

BIOMED RESEARCH INTERNATIONAL (2022)

Article Chemistry, Multidisciplinary

Ultra-long Near-infrared Repeatable Photochemical Afterglow Mediated by Reversible Storage of Singlet Oxygen for Information Encryption

Lei Chen, Kuangshi Sun, Donghao Hu, Xianlong Su, Linna Guo, Jiamiao Yin, Yuetian Pei, Yiwei Fan, Qian Liu, Ming Xu, Wei Feng, Fuyou Li

Summary: Photochemical afterglow systems have attracted significant attention for their adjustable photophysical properties and potential applications. However, conventional photochemical afterglow lacks repeatability due to the consumption of energy cache units. In this study, we propose a novel strategy to achieve repeatable photochemical afterglow through the reversible storage of O-1(2). This strategy enables the generation of near-infrared afterglow with a lifetime over 10 s, and its initial intensity remains stable over 50 excitation cycles. Mechanism study confirms the repeatable photochemical afterglow is realized through singlet oxygen-sensitized fluorescence emission. The generality of this strategy is demonstrated, allowing for tunable afterglow lifetimes and colors through rational design. Furthermore, the repeatable photochemical afterglow is applied for attacker-misleading information encryption, providing repeatable readout.

ANGEWANDTE CHEMIE-INTERNATIONAL EDITION (2023)

Article Biotechnology & Applied Microbiology

Identification of Smoking-Associated Transcriptome Aberration in Blood with Machine Learning Methods

FeiMing Huang, QingLan Ma, JingXin Ren, JiaRui Li, Fen Wang, Tao Huang, Yu-Dong Cai

Summary: Long-term cigarette smoking is associated with various human diseases, and this study used advanced machine learning methods to identify specific isoforms and pathways that play important roles in distinguishing smokers from former smokers. The study evaluated multiple feature selection algorithms and utilized a decision tree approach to establish high-performance classification models. The identified isoforms and classification rules were validated through previous research. The results highlight the relevance of isoforms such as ENST00000464835, ENST00000622663, and ENST00000284311, as well as pathways related to smoking response.

BIOMED RESEARCH INTERNATIONAL (2023)

Article Chemistry, Organic

Synthesis of 2-Cyanomethyl Indane Derivatives via Pd-Catalyzed Alkene Difunctionalization Reactions of Alkyl Nitriles

Alma R. Perez, Evan C. Bornowski, Lei Chen, John P. Wolfe

Summary: The synthesis of indanes containing substituted cyanomethyl groups at C2 has been achieved through Pd-catalyzed coupling reactions. Alkenyl triflates were used to generate partially saturated analogues via similar transformations. The use of a preformed BrettPhosPd(allyl)(Cl) complex as a precatalyst was crucial for the success of these reactions.

ORGANIC LETTERS (2023)

Article Biology

Identification of Genes Associated with the Impairment of Olfactory and Gustatory Functions in COVID-19 via Machine-Learning Methods

Jingxin Ren, Yuhang Zhang, Wei Guo, Kaiyan Feng, Ye Yuan, Tao Huang, Yu-Dong Cai

Summary: COVID-19 can cause impairment of smell and taste, and this study used machine learning to analyze gene expression levels in COVID-19 patient samples to identify important biomarkers associated with this loss of sensory ability. The study suggests potential mechanisms for COVID-19 complications and provides biomarkers for predicting olfactory and gustatory impairment.

LIFE-BASEL (2023)

Article Biology

Using Machine Learning Methods in Identifying Genes Associated with COVID-19 in Cardiomyocytes and Cardiac Vascular Endothelial Cells

Yaochen Xu, Qinglan Ma, Jingxin Ren, Lei Chen, Wei Guo, Kaiyan Feng, Zhenbing Zeng, Tao Huang, Yudong Cai

Summary: COVID-19 not only damages the respiratory system, but also puts strain on the cardiovascular system. This study analyzed the gene expression levels of vascular endothelial cells and cardiomyocytes in COVID-19 patients and healthy controls using a machine learning-based workflow. The findings suggest that COVID-19 affects the gene expression levels in cardiac cells, providing insights into the pathogenesis of COVID-19 and potential therapeutic targets.

LIFE-BASEL (2023)

Article Biology

Identification of Phase-Separation-Protein-Related Function Based on Gene Ontology by Using Machine Learning Methods

Qinglan Ma, FeiMing Huang, Wei Guo, KaiYan Feng, Tao Huang, Yudong Cai

Summary: Phase-separation proteins (PSPs) play a role in liquid-liquid phase separation and have implications for cellular biology and disease development. Identifying PSPs and their functions can provide valuable insights.

LIFE-BASEL (2023)

Article Biology

Machine Learning Classification of Time since BNT162b2 COVID-19 Vaccination Based on Array-Measured Antibody Activity

Qing-Lan Ma, Fei-Ming Huang, Wei Guo, Kai-Yan Feng, Tao Huang, Yu-Dong Cai

Summary: Vaccines elicit an immune response involving B and T cells, with B cells producing antibodies. The immunity to SARS-CoV-2 diminishes over time after vaccination. This study aimed to identify important changes in antigen-reactive antibodies post-vaccination to enhance vaccine efficacy.

LIFE-BASEL (2023)

Article Biology

Identification of Gene Markers Associated with COVID-19 Severity and Recovery in Different Immune Cell Subtypes

Jing-Xin Ren, Qian Gao, Xiao-Chao Zhou, Lei Chen, Wei Guo, Kai-Yan Feng, Lin Lu, Tao Huang, Yu-Dong Cai

Summary: A machine-learning-based method was used to analyze the scRNA-seq data of B cells, T cells, and myeloid cells from patients with COVID-19. Key genes related to SARS-CoV-2 infection were identified. The study revealed the dynamic changes in the immune system of COVID-19 patients at different stages, providing valuable insights into the ongoing effect of COVID-19 development on the immune system.

BIOLOGY-BASEL (2023)

Article Biology

Identification of Colon Immune Cell Marker Genes Using Machine Learning Methods

Yong Yang, Yuhang Zhang, Jingxin Ren, Kaiyan Feng, Zhandong Li, Tao Huang, Yudong Cai

Summary: This study analyzed single-cell RNA sequencing data from a normal colon to identify genetic markers of 25 immune cell types and reveal quantitative differences between them. Machine learning-based methods were used to analyze the importance of gene features and classify the most important genetic markers. The results provide a reference for exploring the cell composition of the colon cancer microenvironment and clinical immunotherapy.

LIFE-BASEL (2023)

暂无数据