4.6 Article

Predicting enzyme family class in a hybridization space

期刊

PROTEIN SCIENCE
卷 13, 期 11, 页码 2857-2863

出版社

COLD SPRING HARBOR LAB PRESS, PUBLICATIONS DEPT
DOI: 10.1110/ps.04981104

关键词

ENZYME database; 40% cutoff; Gene Ontology; pseudo-amino-acid composition; quasi-sequence-order effect; ISort predictor; GO-PseAA predictor; bioinformatics; proteomics

向作者/读者索取更多资源

Given the sequence of a protein, how can we predict whether it is an enzyme or a non-enzyme? If it is, what enzyme family class it belongs to? Because these questions are closely relevant to the biological function of a protein and its acting object, their importance is self-evident. Particularly with the explosion of protein sequences entering into data banks and the relatively much slower progress in using biochemical experiments to determine their functions, it is highly desired to develop an automated method that can be used to give fast answers to these questions. By hybridizing the gene ontology and pseudo-amino-acid composition, we have introduced a new method that is called GO-PseAA predictor and operate it in a hybridization space. To avoid redundancy and bias, demonstrations were performed on a data set in which none of the proteins in an individual class has greater than or equal to40% sequence identity to any other. The overall success rate thus obtained by the jackknife cross-validation test in identifying enzyme and non-enzyme was 93%, and that in identifying the enzyme family was 94% for the following six main Enzyme Commission (EC) classes: (1) oxidoreductase, (2) transferase, (3) hydrolase, (4) lyase, (5) isomerase, and (6) ligase. The corresponding rates by the independent data set test were 98% and 97%, respectively.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Biotechnology & Applied Microbiology

Predicting RNA 5-Methylcytosine Sites by Using Essential Sequence Features and Distributions

Lei Chen, ZhanDong Li, ShiQi Zhang, Yu-Hang Zhang, Tao Huang, Yu-Dong Cai

Summary: Methylation is a common and important modification in biological systems, and recent studies have found that methylation is widely present in different RNA molecules. Computational prediction methods may serve as an alternative to detect all methylation sites.

BIOMED RESEARCH INTERNATIONAL (2022)

Article Biology

Predicting Heart Cell Types by Using Transcriptome Profiles and a Machine Learning Method

Shijian Ding, Deling Wang, Xianchao Zhou, Lei Chen, Kaiyan Feng, Xianling Xu, Tao Huang, Zhandong Li, Yudong Cai

Summary: This study used multiple machine learning methods to analyze single-cell profiles of the heart and identify the best features and classifiers for different heart cell types. The results showed that the decision tree and random forest classification models achieved the highest weighted F1 scores. The selected features and classification rules played a crucial role in cardiac structure and function, particularly certain long non-coding RNAs were found to be important for recognizing different cardiac cell types. These findings provide a solid academic foundation for the development of molecular diagnostics and biomarker discovery for cardiac diseases.

LIFE-BASEL (2022)

Article Multidisciplinary Sciences

Identifying luminal and basal mammary cell specific genes and their expression patterns during pregnancy

Zhan Dong Li, Xiangtian Yu, Zi Mei, Tao Zeng, Lei Chen, Xian Ling Xu, Hao Li, Tao Huang, Yu-Dong Cai

Summary: The mammary gland is an essential organ in mammals that produces milk for offspring. This study investigates the mechanisms underlying the differentiation of mammary progenitors into different cell subtypes using single-cell sequencing data. The findings identify specific gene features and rules that can classify epithelial cells into different subtypes and stages.

PLOS ONE (2022)

Article Biotechnology & Applied Microbiology

Identification of Human Cell Cycle Phase Markers Based on Single-Cell RNA-Seq Data by Using Machine Learning Methods

FeiMing Huang, Lei Chen, Wei Guo, Tao Huang, Yu-dong Cai

Summary: The study constructs efficient classifiers based on single-cell RNA sequencing data and identifies essential gene biomarkers, while also mining a series of classification rules that can distinguish different cell cycle phases, providing a novel method for determining the cell cycle and identifying new potential cell cycle-related genes.

BIOMED RESEARCH INTERNATIONAL (2022)

Article Biotechnology & Applied Microbiology

Subcellular Localization Prediction of Human Proteins Using Multifeature Selection Methods

Yu-Hang Zhang, ShiJian Ding, Lei Chen, Tao Huang, Yu-Dong Cai

Summary: This study developed a predictive model for subcellular localization by using protein-protein interaction networks, functional enrichment analysis, and proteins with confirmed localization. Various machine learning algorithms and feature selection methods were utilized to identify key features and understand their biological functions.

BIOMED RESEARCH INTERNATIONAL (2022)

Article Biotechnology & Applied Microbiology

Identification of Smoking-Associated Transcriptome Aberration in Blood with Machine Learning Methods

FeiMing Huang, QingLan Ma, JingXin Ren, JiaRui Li, Fen Wang, Tao Huang, Yu-Dong Cai

Summary: Long-term cigarette smoking is associated with various human diseases, and this study used advanced machine learning methods to identify specific isoforms and pathways that play important roles in distinguishing smokers from former smokers. The study evaluated multiple feature selection algorithms and utilized a decision tree approach to establish high-performance classification models. The identified isoforms and classification rules were validated through previous research. The results highlight the relevance of isoforms such as ENST00000464835, ENST00000622663, and ENST00000284311, as well as pathways related to smoking response.

BIOMED RESEARCH INTERNATIONAL (2023)

Article Biotechnology & Applied Microbiology

Identification of Methylation Signatures and Rules for Sarcoma Subtypes by Machine Learning Methods

Jingxin Ren, XianChao Zhou, Wei Guo, KaiYan Feng, Tao Huang, Yu-Dong Cai

Summary: Sarcoma, a common type of solid tumor in children and adolescents, has multiple subtypes that are often difficult to diagnose early, resulting in severe consequences. This study aimed to find potential biomarkers at the DNA methylation level to distinguish different sarcoma subtypes. Machine learning and feature ranking methods were used to analyze sarcoma samples and construct classification models. The specific expression of genes related to highly correlated methylation sites was proven to be associated with sarcoma, and decision tree algorithm helped to understand the differences between sarcoma types and classify subtypes.

BIOMED RESEARCH INTERNATIONAL (2022)

Article Biochemistry & Molecular Biology

Identification of Transcriptome Biomarkers for Severe COVID-19 with Machine Learning Methods

Xiaohong Li, Xianchao Zhou, Shijian Ding, Lei Chen, Kaiyan Feng, Hao Li, Tao Huang, Yu-Dong Cai

Summary: In this study, machine learning methods were used to identify biomarkers that can accurately classify COVID-19 in different disease states and severities. The findings provide a new point of reference for understanding the disease's etiology and facilitating precise therapy.

BIOMOLECULES (2022)

Article Biology

Identifying MicroRNA Markers That Predict COVID-19 Severity Using Machine Learning Methods

Jingxin Ren, Wei Guo, Kaiyan Feng, Tao Huang, Yudong Cai

Summary: In this study, the blood expression profiles of miRNA were analyzed to identify potential markers for differentiating the severity of COVID-19. The researchers constructed a high-precision RF model and extracted classification rules to quantify the role of miRNA expression in distinguishing COVID-19 patients with different severities.

LIFE-BASEL (2022)

Article Biology

Identification of Genes Associated with the Impairment of Olfactory and Gustatory Functions in COVID-19 via Machine-Learning Methods

Jingxin Ren, Yuhang Zhang, Wei Guo, Kaiyan Feng, Ye Yuan, Tao Huang, Yu-Dong Cai

Summary: COVID-19 can cause impairment of smell and taste, and this study used machine learning to analyze gene expression levels in COVID-19 patient samples to identify important biomarkers associated with this loss of sensory ability. The study suggests potential mechanisms for COVID-19 complications and provides biomarkers for predicting olfactory and gustatory impairment.

LIFE-BASEL (2023)

Article Biology

Using Machine Learning Methods in Identifying Genes Associated with COVID-19 in Cardiomyocytes and Cardiac Vascular Endothelial Cells

Yaochen Xu, Qinglan Ma, Jingxin Ren, Lei Chen, Wei Guo, Kaiyan Feng, Zhenbing Zeng, Tao Huang, Yudong Cai

Summary: COVID-19 not only damages the respiratory system, but also puts strain on the cardiovascular system. This study analyzed the gene expression levels of vascular endothelial cells and cardiomyocytes in COVID-19 patients and healthy controls using a machine learning-based workflow. The findings suggest that COVID-19 affects the gene expression levels in cardiac cells, providing insights into the pathogenesis of COVID-19 and potential therapeutic targets.

LIFE-BASEL (2023)

Article Biology

Identification of Phase-Separation-Protein-Related Function Based on Gene Ontology by Using Machine Learning Methods

Qinglan Ma, FeiMing Huang, Wei Guo, KaiYan Feng, Tao Huang, Yudong Cai

Summary: Phase-separation proteins (PSPs) play a role in liquid-liquid phase separation and have implications for cellular biology and disease development. Identifying PSPs and their functions can provide valuable insights.

LIFE-BASEL (2023)

Article Biology

Machine Learning Classification of Time since BNT162b2 COVID-19 Vaccination Based on Array-Measured Antibody Activity

Qing-Lan Ma, Fei-Ming Huang, Wei Guo, Kai-Yan Feng, Tao Huang, Yu-Dong Cai

Summary: Vaccines elicit an immune response involving B and T cells, with B cells producing antibodies. The immunity to SARS-CoV-2 diminishes over time after vaccination. This study aimed to identify important changes in antigen-reactive antibodies post-vaccination to enhance vaccine efficacy.

LIFE-BASEL (2023)

Article Biology

Identification of Gene Markers Associated with COVID-19 Severity and Recovery in Different Immune Cell Subtypes

Jing-Xin Ren, Qian Gao, Xiao-Chao Zhou, Lei Chen, Wei Guo, Kai-Yan Feng, Lin Lu, Tao Huang, Yu-Dong Cai

Summary: A machine-learning-based method was used to analyze the scRNA-seq data of B cells, T cells, and myeloid cells from patients with COVID-19. Key genes related to SARS-CoV-2 infection were identified. The study revealed the dynamic changes in the immune system of COVID-19 patients at different stages, providing valuable insights into the ongoing effect of COVID-19 development on the immune system.

BIOLOGY-BASEL (2023)

Article Biology

Identification of Colon Immune Cell Marker Genes Using Machine Learning Methods

Yong Yang, Yuhang Zhang, Jingxin Ren, Kaiyan Feng, Zhandong Li, Tao Huang, Yudong Cai

Summary: This study analyzed single-cell RNA sequencing data from a normal colon to identify genetic markers of 25 immune cell types and reveal quantitative differences between them. Machine learning-based methods were used to analyze the importance of gene features and classify the most important genetic markers. The results provide a reference for exploring the cell composition of the colon cancer microenvironment and clinical immunotherapy.

LIFE-BASEL (2023)

暂无数据