4.7 Article

PSFM-DBT: Identifying DNA-Binding Proteins by Combing Position Specific Frequency Matrix and Distance-Bigram Transformation

期刊

出版社

MDPI
DOI: 10.3390/ijms18091856

关键词

PSFM-DBT; DNA binding protein; distance bigram transformation; PSFM

资金

  1. National Natural Science Foundation of China [61672184]
  2. Natural Science Foundation of Guangdong Province [2014A030313695]
  3. Guangdong Natural Science Funds for Distinguished Young Scholars [2016A030306008]
  4. Scientific Research Foundation in Shenzhen [JCYJ20150626110425228, JCYJ20170307152201596]
  5. Guangdong Special Support Program of Technology Young talents [2016TQ03X618]

向作者/读者索取更多资源

DNA-binding proteins play crucial roles in various biological processes, such as DNA replication and repair, transcriptional regulation and many other biological activities associated with DNA. Experimental recognition techniques for DNA-binding proteins identification are both time consuming and expensive. Effective methods for identifying these proteins only based on protein sequences are highly required. The key for sequence-based methods is to effectively represent protein sequences. It has been reported by various previous studies that evolutionary information is crucial for DNA-binding protein identification. In this study, we employed four methods to extract the evolutionary information from Position Specific Frequency Matrix (PSFM), including Residue Probing Transformation (RPT), Evolutionary Difference Transformation (EDT), Distance-Bigram Transformation (DBT), and Trigram Transformation (TT). The PSFMs were converted into fixed length feature vectors by these four methods, and then respectively combined with Support Vector Machines (SVMs); four predictors for identifying these proteins were constructed, including PSFM-RPT, PSFM-EDT, PSFM-DBT, and PSFM-TT. Experimental results on a widely used benchmark dataset PDB1075 and an independent dataset PDB186 showed that these four methods achieved state-of-the-art-performance, and PSFM-DBT outperformed other existing methods in this field. For practical applications, a user-friendly webserver of PSFM-DBT was established, which is available at http://bioinformatics.hitsz.edu.cn/PSFM-DBT/.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Biochemical Research Methods

Fold-LTR-TCP: protein fold recognition based on triadic closure principle

Bin Liu, Yulin Zhu, Ke Yan

BRIEFINGS IN BIOINFORMATICS (2020)

Article Medicine, Research & Experimental

sgRNA-PSM: Predict sgRNAs On-Target Activity Based on Position-Specific Mismatch

Bin Liu, Zhihua Luo, Juan He

MOLECULAR THERAPY-NUCLEIC ACIDS (2020)

Article Biochemical Research Methods

IDP-Seq2Seq: identification of intrinsically disordered regions based on sequence to sequence learning

Yi-Jun Tang, Yi-He Pang, Bin Liu

BIOINFORMATICS (2020)

Article Biochemical Research Methods

FoldRec-C2C: protein fold recognition by combining cluster-to-cluster model and protein similarity network

Jiangyi Shao, Ke Yan, Bin Liu

Summary: The FoldRec-C2C predictor globally incorporates protein interactions for protein fold recognition, treating it as an information retrieval task in natural language processing. Tested on the LINDAHL dataset, FoldRec-C2C outperforms 34 state-of-the-art methods in the field.

BRIEFINGS IN BIOINFORMATICS (2021)

Article Biochemical Research Methods

ProtFold-DFG: protein fold recognition by combining Directed Fusion Graph and PageRank algorithm

Jiangyi Shao, Bin Liu

Summary: This study introduces a network-based predictor ProtFold-DFG for protein fold recognition, utilizing Directed Fusion Graph (DFG), KL divergence, and PageRank algorithm to enhance recognition accuracy. Experimental results demonstrate that ProtFold-DFG outperforms 35 other methods on the LINDAHL dataset, making it a promising approach for protein fold recognition.

BRIEFINGS IN BIOINFORMATICS (2021)

Article Biochemical Research Methods

Protein Fold Recognition by Combining Support Vector Machines and Pairwise Sequence Similarity Scores

Ke Yan, Jie Wen, Jin-Xing Liu, Yong Xu, Bin Liu

Summary: The study proposed two novel algorithms, TSVM-fold and ESVM-fold, utilizing sequence similarity scores generated by multiple template-based methods for protein fold recognition prediction. Experimental results showed that these algorithms outperform some state-of-the-art methods in rigorous benchmark datasets.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2021)

Article Biochemical Research Methods

iLncRNAdis-FB: Identify lncRNA-Disease Associations by Fusing Biological Feature Blocks Through Deep Neural Network

Hang Wei, Qing Liao, Bin Liu

Summary: Identifying lncRNA-disease associations is crucial for exploring disease mechanisms and molecular drug discovery. However, current fusion strategies fail to remove noisy and irrelevant information, leading to low predictive performance. iLncRNAdis-FB proposes a new computational predictor based on CNN to integrate feature blocks from different data sources, achieving better prediction accuracy.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2021)

Article Biochemical Research Methods

RFPR-IDP: reduce the false positive rates for intrinsically disordered protein and region prediction by incorporating both fully ordered proteins and disordered proteins

Yumeng Liu, Xiaolong Wang, Bin Liu

Summary: Intrinsically disordered proteins/regions (IDPs/IDRs) are important for biological functions, and accurate prediction is crucial for protein structure and function predictions. However, most existing methods tend to predict fully ordered proteins as disordered, ignoring the fact that most newly sequenced proteins are fully ordered. The proposed RFPR-IDP method, trained on both ordered and disordered proteins, outperforms existing predictors in predicting both ordered and disordered proteins.

BRIEFINGS IN BIOINFORMATICS (2021)

Article Biochemical Research Methods

idenPC-MIIP: identify protein complexes from weighted PPI networks using mutual important interacting partner relation

Zhourun Wu, Qing Liao, Bin Liu

Summary: Protein complexes are key units for studying a cell system, and high-throughput approaches have enabled the determination of PPI data. The proposed mutual important interacting partner relation and the new algorithm idenPC-MIIP show improved performance in identifying protein complexes compared to existing methods.

BRIEFINGS IN BIOINFORMATICS (2021)

Article Computer Science, Artificial Intelligence

MLDH-Fold: Protein fold recognition based on multi-view low-rank modeling

Ke Yan, Jie Wen, Yong Xu, Bin Liu

Summary: Protein fold recognition is crucial for understanding protein functions and drug design. New methods (MVLR and MLDH-Fold) were proposed to improve predictive performance by combining different views of protein sequences. Experimental results show that these computational methods outperform other predictors, indicating their usefulness for protein fold recognition.

NEUROCOMPUTING (2021)

Article Biochemical Research Methods

iCircDA-MF: identification of circRNA-disease associations based on matrix factorization

Hang Wei, Bin Liu

BRIEFINGS IN BIOINFORMATICS (2020)

Article Biochemical Research Methods

HITS-PR-HHblits: protein remote homology detection by combining PageRank and Hyperlink-Induced Topic Search

Bin Liu, Shuangyan Jiang, Quan Zou

BRIEFINGS IN BIOINFORMATICS (2020)

暂无数据