☆ 3.9 Article

Investigating the performance of AIC in selecting phylogenetic models

STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY (2014)

Journal

STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY

Volume 13, Issue 4, Pages 459-475

Publisher

WALTER DE GRUYTER GMBH

DOI: 10.1515/sagmb-2013-0048

Keywords

AIC; Kullback-Leibler divergence; model selection; phylogenetics

Funding

National Science Council, Taiwan [NSC-101-2118-M-035-001]
National Science Foundation
U.S. Department of Homeland Security
U.S. Department of Agriculture through NSF [EF-0832858, DBI-1300426]
University of Tennessee, Knoxville
National Science Foundation [DMS-1222745, DMS-1127914]
University of Wyoming from the National Science Foundation [DMS-1100615]
Direct For Mathematical & Physical Scien
Division Of Mathematical Sciences [1100695] Funding Source: National Science Foundation
Direct For Mathematical & Physical Scien
Division Of Mathematical Sciences [1222745] Funding Source: National Science Foundation
Div Of Biological Infrastructure
Direct For Biological Sciences [1300426] Funding Source: National Science Foundation

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

The popular likelihood-based model selection criterion, Akaike's Information Criterion (AIC), is a breakthrough mathematical result derived from information theory. AIC is an approximation to Kullback-Leibler (KL) divergence with the derivation relying on the assumption that the likelihood function has finite second derivatives. However, for phylogenetic estimation, given that tree space is discrete with respect to tree topology, the assumption of a continuous likelihood function with finite second derivatives is violated. In this paper, we investigate the relationship between the expected log likelihood of a candidate model, and the expected KL divergence in the context of phylogenetic tree estimation. We find that given the tree topology, AIC is an unbiased estimator of the expected KL divergence. However, when the tree topology is unknown, AIC tends to underestimate the expected KL divergence for phylogenetic models. Simulation results suggest that the degree of underestimation varies across phylogenetic models so that even for large sample sizes, the bias of AIC can result in selecting a wrong model. As the choice of phylogenetic models is essential for statistical phylogenetic inference, it is important to improve the accuracy of model selection criteria in the context of phylogenetics.

Investigating the performance of AIC in selecting phylogenetic models

Journal

STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY

Publisher

WALTER DE GRUYTER GMBH

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Investigating the performance of AIC in selecting phylogenetic models

Journal

STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY

Publisher

WALTER DE GRUYTER GMBH

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper