4.1 Article

Improving the performance of protein kinase identification via high dimensional protein-protein interactions and substrate structure data

期刊

MOLECULAR BIOSYSTEMS
卷 10, 期 3, 页码 694-702

出版社

ROYAL SOC CHEMISTRY
DOI: 10.1039/c3mb70462a

关键词

-

资金

  1. National Natural Science Foundation of China [61101061, 31100955]
  2. Fundamental Research Funds for the Central Universities [WK2100230011]

向作者/读者索取更多资源

As a crucial post-translational modification, protein phosphorylation regulates almost all basic cellular processes. Recently, thousands of phosphorylation sites have been discovered by large-scale phospho-proteomics studies, but only about 20% of them have information regarding catalytic kinases, which brings a great challenge for correct identification of the protein kinases responsible for experimentally verified phosphorylation sites. In most existing identification tools, only a local sequence was selected to construct predictive models, and information regarding protein-protein interaction (PPI) was adopted for further filtering. However, the limited information utilized by these tools is not sufficient to identify protein kinases responsible for phosphorylated proteins. In this work, a novel computational approach that fully incorporates PPI and substrate structure information is proposed to improve the performance of human protein kinase identification. To handle the issue of high-dimensional PPI and structure data, a two-step feature selection algorithm that incorporates a support vector machine (SVM), is designed to detect information useful in discriminating the corresponding kinase of phosphorylation sites. Benchmark datasets for kinase identification are constructed using human protein phosphorylation data extracted from the latest Phospho. ELM database. With the selected PPI and structure features, the performance of kinase identification is significantly enhanced as compared with that obtained by using only sequence information. To further verify our method, we compared it with the state-of-the-art tools NetworKIN and IGPS at two stringency levels with medium (>90.0%) and high (>99.0%) specificity. The results show that our method outperforms existing tools in identifying protein kinases. Further evaluation demonstrates that our method also has superior performance on different hierarchical levels including kinase, subfamily, family and group.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.1
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据