☆ 4.6 Article

What's in a Likelihood? Simple Models of Protein Evolution and the Contribution of Structurally Viable Reconstructions to the Likelihood

SYSTEMATIC BIOLOGY (2011)

期刊

SYSTEMATIC BIOLOGY

卷 60, 期 2, 页码 161-174

出版社

OXFORD UNIV PRESS

DOI: 10.1093/sysbio/syq088

关键词

Ancestral state reconstruction; empirical amino acid models; maximum likelihood; phylogenetics; protein structure

类别

Evolutionary Biology

资金

Marie Curie Fellowship
National Science Foundation [DEB 1036500]
Division Of Environmental Biology
Direct For Biological Sciences [1132229] Funding Source: National Science Foundation

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Most phylogenetic models of protein evolution assume that sites are independent and identically distributed. Interactions between sites are ignored, and the likelihood can be conveniently calculated as the product of the individual site likelihoods. The calculation considers all possible transition paths (also called substitution histories or mappings) that are consistent with the observed states at the terminals, and the probability density of any particular reconstruction depends on the substitution model. The likelihood is the integral of the probability density of each substitution history taken over all possible histories that are consistent with the observed data. We investigated the extent to which transition paths that are incompatible with a protein's three-dimensional structure contribute to the likelihood. Several empirical amino acid models were tested for sequence pairs of different degrees of divergence. When simulating substitutional histories starting from a real sequence, the structural integrity of the simulated sequences quickly disintegrated. This result indicates that simple models are clearly unable to capture the constraints on sequence evolution. However, when we sampled transition paths between real sequences from the posterior probability distribution according to these same models, we found that the sampled histories were largely consistent with the tertiary structure. This suggests that simple empirical substitution models may be adequate for interpolating changes between observed sequences during phylogenetic inference despite the fact that the models cannot predict the effects of structural constraints from first principles. This study is significant because it provides a quantitative assessment of the biological realism of substitution models from the perspective of protein structure, and it provides insight on the prospects for improving models of protein sequence evolution.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

主要评分

4.6

评分不足

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

QMaker: Fast and Accurate Method to Estimate Empirical Models of Protein Evolution

Bui Quang Minh, Cuong Cao Dang, Le Sy Vinh, Robert Lanfear

Summary: Amino acid substitution models are crucial in phylogenetic analyses, and a new ML method called QMaker has been proposed to estimate a general time-reversible Q matrix from large protein data sets. QMaker combines an efficient ML tree search algorithm, model selection for handling model heterogeneity among alignments, and consideration of rate mixture models among sites.

SYSTEMATIC BIOLOGY (2021)