4.6 Article

Clustering of protein domains for functional and evolutionary studies

Journal

BMC BIOINFORMATICS
Volume 10, Issue -, Pages -

Publisher

BMC
DOI: 10.1186/1471-2105-10-335

Keywords

-

Funding

  1. iProject [8045 M047]
  2. Ministry of Science, Education and Sports, Republic of Croatia [037-0982913-2762, 098-0982913-2877, 058-0000000-3475]
  3. German Academic Exchange Service (DAAD)
  4. Leverhulme Trust,
  5. Japanese Bio-Industry Association
  6. The School of Pharmacy, University of London
  7. UNESCO and L'Oreal

Ask authors/readers for more resources

Background: The number of protein family members defined by DNA sequencing is usually much larger than those characterised experimentally. This paper describes a method to divide protein families into subtypes purely on sequence criteria. Comparison with experimental data allows an independent test of the quality of the clustering. Results: An evolutionary split statistic is calculated for each column in a protein multiple sequence alignment; the statistic has a larger value when a column is better described by an evolutionary model that assumes clustering around two or more amino acids rather than a single amino acid. The user selects columns (typically the top ranked columns) to construct a motif. The motif is used to divide the family into subtypes using a stochastic optimization procedure related to the deterministic annealing EM algorithm (DAEM), which yields a specificity score showing how well each family member is assigned to a subtype. The clustering obtained is not strongly dependent on the number of amino acids chosen for the motif. The robustness of this method was demonstrated using six well characterized protein families: nucleotidyl cyclase, protein kinase, dehydrogenase, two polyketide synthase domains and small heat shock proteins. Phylogenetic trees did not allow accurate clustering for three of the six families. Conclusion: The method clustered the families into functional subtypes with an accuracy of 90 to 100%. False assignments usually had a low specificity score.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available