☆ 4.7 Article

IDP-Seq2Seq: identification of intrinsically disordered regions based on sequence to sequence learning

BIOINFORMATICS (2020)

期刊

BIOINFORMATICS

卷 36, 期 21, 页码 5177-5186

出版社

OXFORD UNIV PRESS

DOI: 10.1093/bioinformatics/btaa667

关键词

类别

Biochemical Research Methods Biotechnology & Applied Microbiology Computer Science, Interdisciplinary Applications Mathematical & Computational Biology Statistics & Probability

资金

National Natural Science Foundation of China [61822306, 61672184, 61702134, 61861146002, 61732012]
Beijing Natural Science Foundation [JQ19019]
Fok Ying-Tung Education Foundation for Young Teachers in the Higher Education Institutions of China [161063]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Motivation: Related to many important biological functions, intrinsically disordered regions (IDRs) are widely distributed in proteins. Accurate prediction of IDRs is critical for the protein structure and function analysis. However, the existing computational methods construct the predictive models solely in the sequence space, failing to convert the sequence space into the 'semantic space' to reflect the structure characteristics of proteins. Furthermore, although the length-dependent predictors showed promising results, new fusion strategies should be explored to improve their predictive performance and the generalization. Results: In this study, we applied the Sequence to Sequence Learning (Seq2Seq) derived from natural language processing (NLP) to map protein sequences to 'semantic space' to reflect the structure patterns with the help of predicted residue-residue contacts (CCMs) and other sequence-based features. Furthermore, the Attention mechanism was used to capture the global associations between all residue pairs in the proteins. Three length-dependent predictors were constructed: IDP-Seq2Seq-L for long disordered region prediction, IDP-Seq2Seq-S for short disordered region prediction and IDP-Seq2Seq-G for both long and short disordered region predictions. Finally, these three predictors were fused into one predictor called IDP-Seq2Seq to improve the discriminative power and generalization. Experimental results on four independent test datasets and the CASP test dataset showed that IDP-Seq2Seq is insensitive with the ratios of long and short disordered regions and outperforms other competing methods.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

主要评分

4.7

评分不足

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

Prediction of protein-protein interaction sites in intrinsically disordered proteins

Ranran Chen, Xinlu Li, Yaqing Yang, Xixi Song, Cheng Wang, Dongdong Qiao

Summary: This paper presents a comprehensive overview of recently published predictors for intrinsically disordered protein (IDP) binding site prediction. The authors collected 30 representative predictors and summarized their databases, features, and algorithms. The predictors were divided into scoring functions, machine learning-based prediction, and consensus approaches, with detailed descriptions of their algorithms and performances. This study not only provides a full picture of the current status of IDP binding prediction, but also serves as a guide for selecting different methods and inspires future development trends and principles.

FRONTIERS IN MOLECULAR BIOSCIENCES (2022)