☆ 4.6 Article

Clustering of protein domains for functional and evolutionary studies

BMC BIOINFORMATICS (2009)

Journal

BMC BIOINFORMATICS

Volume 10, Issue -, Pages -

Publisher

BMC

DOI: 10.1186/1471-2105-10-335

Keywords

Funding

iProject [8045 M047]
Ministry of Science, Education and Sports, Republic of Croatia [037-0982913-2762, 098-0982913-2877, 058-0000000-3475]
German Academic Exchange Service (DAAD)
Leverhulme Trust,
Japanese Bio-Industry Association
The School of Pharmacy, University of London
UNESCO and L'Oreal

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Background: The number of protein family members defined by DNA sequencing is usually much larger than those characterised experimentally. This paper describes a method to divide protein families into subtypes purely on sequence criteria. Comparison with experimental data allows an independent test of the quality of the clustering. Results: An evolutionary split statistic is calculated for each column in a protein multiple sequence alignment; the statistic has a larger value when a column is better described by an evolutionary model that assumes clustering around two or more amino acids rather than a single amino acid. The user selects columns (typically the top ranked columns) to construct a motif. The motif is used to divide the family into subtypes using a stochastic optimization procedure related to the deterministic annealing EM algorithm (DAEM), which yields a specificity score showing how well each family member is assigned to a subtype. The clustering obtained is not strongly dependent on the number of amino acids chosen for the motif. The robustness of this method was demonstrated using six well characterized protein families: nucleotidyl cyclase, protein kinase, dehydrogenase, two polyketide synthase domains and small heat shock proteins. Phylogenetic trees did not allow accurate clustering for three of the six families. Conclusion: The method clustered the families into functional subtypes with an accuracy of 90 to 100%. False assignments usually had a low specificity score.

Clustering of protein domains for functional and evolutionary studies

Journal

BMC BIOINFORMATICS

Publisher

BMC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Clustering of protein domains for functional and evolutionary studies

Journal

BMC BIOINFORMATICS

Publisher

BMC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper