4.1 Article

Ensemble learning prediction of protein-protein interactions using proteins functional annotations

期刊

MOLECULAR BIOSYSTEMS
卷 10, 期 4, 页码 820-830

出版社

ROYAL SOC CHEMISTRY
DOI: 10.1039/c3mb70486f

关键词

-

资金

  1. Polish Ministry of Education and Science [N301 159735, NCN 2013/09/B/NZ2/00121]
  2. University with Potential for Excellence (UPE) - Phase II project grant from University Grants Commission (UGC) in India
  3. Swedish Foundation for Strategic Research
  4. research fellowship within Project Information technologies: Research and their interdisciplinary applications [UDA-POKL.04.01.01-00-051/10-00]

向作者/读者索取更多资源

Protein-protein interactions are important for the majority of biological processes. A significant number of computational methods have been developed to predict protein-protein interactions using protein sequence, structural and genomic data. Vast experimental data is publicly available on the Internet, but it is scattered across numerous databases. This fact motivated us to create and evaluate new high-throughput datasets of interacting proteins. We extracted interaction data from DIP, MINT, BioGRID and IntAct databases. Then we constructed descriptive features for machine learning purposes based on data from Gene Ontology and DOMINE. Thereafter, four well-established machine learning methods: Support Vector Machine, Random Forest, Decision Tree and Naive Bayes, were used on these datasets to build an Ensemble Learning method based on majority voting. In cross-validation experiment, sensitivity exceeded 80% and classification/prediction accuracy reached 90% for the Ensemble Learning method. We extended the experiment to a bigger and more realistic dataset maintaining sensitivity over 70%. These results confirmed that our datasets are suitable for performing PPI prediction and Ensemble Learning method is well suited for this task. Both the processed PPI datasets and the software are available at http://sysbio.icm.edu.pl/indra/EL-PPI/home.html.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.1
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据