3.9 Article

Investigating the performance of AIC in selecting phylogenetic models

Journal

Publisher

WALTER DE GRUYTER GMBH
DOI: 10.1515/sagmb-2013-0048

Keywords

AIC; Kullback-Leibler divergence; model selection; phylogenetics

Funding

  1. National Science Council, Taiwan [NSC-101-2118-M-035-001]
  2. National Science Foundation
  3. U.S. Department of Homeland Security
  4. U.S. Department of Agriculture through NSF [EF-0832858, DBI-1300426]
  5. University of Tennessee, Knoxville
  6. National Science Foundation [DMS-1222745, DMS-1127914]
  7. University of Wyoming from the National Science Foundation [DMS-1100615]
  8. Direct For Mathematical & Physical Scien
  9. Division Of Mathematical Sciences [1100695] Funding Source: National Science Foundation
  10. Direct For Mathematical & Physical Scien
  11. Division Of Mathematical Sciences [1222745] Funding Source: National Science Foundation
  12. Div Of Biological Infrastructure
  13. Direct For Biological Sciences [1300426] Funding Source: National Science Foundation

Ask authors/readers for more resources

The popular likelihood-based model selection criterion, Akaike's Information Criterion (AIC), is a breakthrough mathematical result derived from information theory. AIC is an approximation to Kullback-Leibler (KL) divergence with the derivation relying on the assumption that the likelihood function has finite second derivatives. However, for phylogenetic estimation, given that tree space is discrete with respect to tree topology, the assumption of a continuous likelihood function with finite second derivatives is violated. In this paper, we investigate the relationship between the expected log likelihood of a candidate model, and the expected KL divergence in the context of phylogenetic tree estimation. We find that given the tree topology, AIC is an unbiased estimator of the expected KL divergence. However, when the tree topology is unknown, AIC tends to underestimate the expected KL divergence for phylogenetic models. Simulation results suggest that the degree of underestimation varies across phylogenetic models so that even for large sample sizes, the bias of AIC can result in selecting a wrong model. As the choice of phylogenetic models is essential for statistical phylogenetic inference, it is important to improve the accuracy of model selection criteria in the context of phylogenetics.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.9
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available