4.4 Article

Optimal subset selection of primary sequence features using the genetic algorithm for thermophilic proteins identification

期刊

BIOTECHNOLOGY LETTERS
卷 36, 期 10, 页码 1963-1969

出版社

SPRINGER
DOI: 10.1007/s10529-014-1577-3

关键词

Feature selection; Genetic algorithm; g-gap dipeptide; Multiple linear regression; Protein thermostability

资金

  1. National Natural Science Foundation of China [30871614]
  2. Tianjin Natural Science Foundation [08JCYBJC04100]

向作者/读者索取更多资源

A genetic algorithm (GA) coupled with multiple linear regression (MLR) was used to extract useful features from amino acids and g-gap dipeptides for distinguishing between thermophilic and non-thermophilic proteins. The method was trained by a benchmark dataset of 915 thermophilic and 793 non-thermophilic proteins. The method reached an overall accuracy of 95.4 % in a Jackknife test using nine amino acids, 38 0-gap dipeptides and 29 1-gap dipeptides. The accuracy as a function of protein size ranged between 85.8 and 96.9 %. The overall accuracies of three independent tests were 93, 93.4 and 91.8 %. The observed results of detecting thermophilic proteins suggest that the GA-MLR approach described herein should be a powerful method for selecting features that describe thermostabile machines and be an aid in the design of more stable proteins.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据