4.5 Article

Using ensemble methods to deal with imbalanced data in predicting protein-protein interactions

期刊

COMPUTATIONAL BIOLOGY AND CHEMISTRY
卷 36, 期 -, 页码 36-41

出版社

ELSEVIER SCI LTD
DOI: 10.1016/j.compbiolchem.2011.12.003

关键词

Protein-protein interaction; Ensemble methods; Imbalanced data

资金

  1. National Natural Science Foundation of China [20905054, 20972103]
  2. Specialized Research Fund for the Doctoral Program of Higher Education [20090181120058]

向作者/读者索取更多资源

In proteins, the number of interacting pairs is usually much smaller than the number of non-interacting ones. So the imbalanced data problem will arise in the field of protein-protein interactions (PPIs) prediction. In this article, we introduce two ensemble methods to solve the imbalanced data problem. These ensemble methods combine the based-cluster under-sampling technique and the fusion classifiers. And then we evaluate the ensemble methods using a dataset from Database of Interacting Proteins (DIP) with 10-fold cross validation. All the prediction models achieve area under the receiver operating characteristic curve (AUC) value about 95%. Our results show that the ensemble classifiers are quite effective in predicting PPIs; we also gain some valuable conclusions on the performance of ensemble methods for PPIs in imbalanced data. The prediction software and all dataset employed in the work can be obtained for free at http://cic.scu.edu.cn/bioinformatics/Ensemble_PPIs/index.html. (C) 2011 Elsevier Ltd. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据