4.5 Article

Exploring an alignment free approach for protein classification and structural class prediction

Journal

BIOCHIMIE
Volume 90, Issue 4, Pages 615-625

Publisher

ELSEVIER FRANCE-EDITIONS SCIENTIFIQUES MEDICALES ELSEVIER
DOI: 10.1016/j.biochi.2007.11.004

Keywords

protein classification; protein structural class; Chaos Game Representation

Ask authors/readers for more resources

Alignment free methods based on Chaos Game Representation (CGR), also known as sequence signature approaches, have proven of great interest for DNA sequence analysis. Indeed, they have been successfully applied for sequence comparison, phylogeny, detection of horizontal transfers or extraction of representative motifs in regulation sequences. Transposing such methods to proteins poses several fundamental questions related to representation space dimensionality. Several studies have tackled these points, but none has, so far, brought the application of CGRs to proteins to their fully expected potential. Yet, several studies have shown that techniques based on n-peptide frequencies can be relevant for proteins. Here, we investigate the effectiveness of a strategy based on the CGR approach using a fixed reverse encoding of amino acids into nucleic sequences. We first explore its relevance to protein classification into functional families. We then attempt to apply it to the prediction of protein structural classes. Our results suggest that the reverse encoding approach could be relevant in both cases. We show that it is able to classify functional families of proteins by extracting signatures close to the ProSite patterns. Applied to structural classification, the approach reaches scores of correct classification close to 84%, i.e. close to the scores of related methods in the field. Various optimizations of the approach are still possible, which open the door for future applications. (C) 2007 Elsevier Masson SAS. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available