4.5 Article

The Irredundant Class Method for Remote Homology Detection of Protein Sequences

期刊

JOURNAL OF COMPUTATIONAL BIOLOGY
卷 18, 期 12, 页码 1819-1829

出版社

MARY ANN LIEBERT, INC
DOI: 10.1089/cmb.2010.0171

关键词

combinatorics; genome analysis; protein motifs; sequence analysis; strings

资金

  1. University of Padova
  2. CARIPARO
  3. Fondazione Ing. Aldo. Gini

向作者/读者索取更多资源

The automatic classification of protein sequences into families is of great help for the functional prediction and annotation of new proteins. In this article, we present a method called Irredundant Class that address the remote homology detection problem. The best performing methods that solve this problem are string kernels, that compute a similarity function between pairs of proteins based on their subsequence composition. We provide evidence that almost all string kernels are based on patterns that are not independent, and therefore the associated similarity scores are obtained using a set of redundant features, overestimating the similarity between the proteins. To specifically address this issue, we introduce the class of irredundant common patterns. Loosely speaking, the set of irredundant common patterns is the smallest class of independent patterns that can describe all common patterns in a pair of sequences. We present a classification method based on the statistics of these patterns, named Irredundant Class. Results on benchmark data show that the Irredundant Class outperforms most of the string kernels previously proposed, and it achieves results as good as the current state-of-the-art method Local Alignment, but using the same pairwise information only once.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据