4.7 Article

Incorporating Distance-Based Top-n-gram and Random Forest To Identify Electron Transport Proteins

期刊

JOURNAL OF PROTEOME RESEARCH
卷 18, 期 7, 页码 2931-2939

出版社

AMER CHEMICAL SOC
DOI: 10.1021/acs.jproteome.9b00250

关键词

electron transport proteins; protein identification; feature extraction; distance-based Top-n-gram method; feature selection; Max-Relevance-Max-Distance; random forest; F-measure; AUC value; ACC

资金

  1. National Key R&D Program of China [2018YFC0910405]
  2. Natural Science Foundation of China [61771331]

向作者/读者索取更多资源

Cellular respiration provides direct energy substances for living organisms. Electron storage and transportation should be completed through electron transport chains during the cellular respiration process. Thus, identifying electron transport proteins is an important research task. In protein identification, selection of the feature extraction method and classification algorithm has a direct bearing on classification. The distance-based Top-n-gram method, which was proposed based on the frequency profile and considered evolutionary information, was used in this study for feature extraction. The Max-Relevance-Max-Distance algorithm was adopted for feature selection. The first 4D features that greatly influenced the classification result were selected to form the feature data set. Finally, the random forest algorithm was used to identify electron transport proteins. Under the 10-fold cross-validation of the model constructed in this study, sensitivity, specificity, and accuracy rates surpassed 85%, 80%, and 82%, respectively. In the testing set, F-measure, AUC value, and accuracy exceeded 74%, 95%, and 86%, respectively. These experimental results indicated that the classification model built in this study is an effective tool in identifying electron transport proteins.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Review Biochemical Research Methods

Application of learning to rank in bioinformatics tasks

Xiaoqing Ru, Xiucai Ye, Tetsuya Sakurai, Quan Zou

Summary: Learning to rank algorithms have been gradually applied to bioinformatics over the past decades, showing significant advantages in various research tasks. This paper analyzes the characteristics and strengths of LTR algorithms compared to other types of algorithms in bioinformatics, discussing ways to better utilize them and addressing current open problems.

BRIEFINGS IN BIOINFORMATICS (2021)

Article Biochemical Research Methods

NerLTR-DTA: drug-target binding affinity prediction based on neighbor relationship and learning to rank

Xiaoqing Ru, Xiucai Ye, Tetsuya Sakurai, Quan Zou

Summary: Motivation: Predicting drug-target binding affinity is crucial in drug discovery and repurposing. However, existing methods face challenges like focusing on one application scenario and not considering the priority order of proteins related to each target drug. Results: The proposed NerLTR-DTA method utilizes neighbor relationship, similarity, and sharing to predict affinity values and priority order, achieving excellent performance in multiple scenarios and outperforming state-of-the-art methods on commonly used datasets. This comprehensive tool can accurately rank drug-protein associations, contributing to new drug discoveries and repurposing efforts.

BIOINFORMATICS (2022)

Article Biochemical Research Methods

Optimization of drug-target affinity prediction methods through feature processing schemes

Xiaoqing Ru, Quan Zou, Chen Lin

Summary: Motivated by the need for high accuracy and interpretability, this study explores various feature selection and dimensionality reduction techniques to optimize drug-target affinity prediction models. Experimental results demonstrate that regression tree-based feature selection is most effective in constructing models with good performance and robustness. Moreover, the study identifies a high-quality feature subset and highlights the breakthrough impact of the top 20D features on prediction. This research emphasizes the importance of feature optimization in constructing high-performance and interpretable models for drug-target affinity prediction.

BIOINFORMATICS (2023)

暂无数据