Journal
STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY
Volume 13, Issue 4, Pages 459-475Publisher
WALTER DE GRUYTER GMBH
DOI: 10.1515/sagmb-2013-0048
Keywords
AIC; Kullback-Leibler divergence; model selection; phylogenetics
Categories
Funding
- National Science Council, Taiwan [NSC-101-2118-M-035-001]
- National Science Foundation
- U.S. Department of Homeland Security
- U.S. Department of Agriculture through NSF [EF-0832858, DBI-1300426]
- University of Tennessee, Knoxville
- National Science Foundation [DMS-1222745, DMS-1127914]
- University of Wyoming from the National Science Foundation [DMS-1100615]
- Direct For Mathematical & Physical Scien
- Division Of Mathematical Sciences [1100695] Funding Source: National Science Foundation
- Direct For Mathematical & Physical Scien
- Division Of Mathematical Sciences [1222745] Funding Source: National Science Foundation
- Div Of Biological Infrastructure
- Direct For Biological Sciences [1300426] Funding Source: National Science Foundation
Ask authors/readers for more resources
The popular likelihood-based model selection criterion, Akaike's Information Criterion (AIC), is a breakthrough mathematical result derived from information theory. AIC is an approximation to Kullback-Leibler (KL) divergence with the derivation relying on the assumption that the likelihood function has finite second derivatives. However, for phylogenetic estimation, given that tree space is discrete with respect to tree topology, the assumption of a continuous likelihood function with finite second derivatives is violated. In this paper, we investigate the relationship between the expected log likelihood of a candidate model, and the expected KL divergence in the context of phylogenetic tree estimation. We find that given the tree topology, AIC is an unbiased estimator of the expected KL divergence. However, when the tree topology is unknown, AIC tends to underestimate the expected KL divergence for phylogenetic models. Simulation results suggest that the degree of underestimation varies across phylogenetic models so that even for large sample sizes, the bias of AIC can result in selecting a wrong model. As the choice of phylogenetic models is essential for statistical phylogenetic inference, it is important to improve the accuracy of model selection criteria in the context of phylogenetics.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available