EconPapers    
Economics at your fingertips  
 

Investigating the performance of AIC in selecting phylogenetic models

Dwueng-Chwuan Jhwueng (), Huzurbazar Snehalata, O’Meara Brian C. and Liu Liang ()
Additional contact information
Huzurbazar Snehalata: Statistical and Applied Mathematical Sciences Institute, Research Triangle Park, NC 27709, USA Department of Statistics, University of Wyoming, Laramie, WY 82071, USA Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA
O’Meara Brian C.: Department of Ecology and Evolutionary Biology, University of Tennessee, Knoxville, TN 37996, USA
Liu Liang: Department of Statistics and Institute of Bioinformatics, University of Georgia, 101 Cedar Street, Athens, GA 30606 USA

Statistical Applications in Genetics and Molecular Biology, 2014, vol. 13, issue 4, 459-475

Abstract: The popular likelihood-based model selection criterion, Akaike’s Information Criterion (AIC), is a breakthrough mathematical result derived from information theory. AIC is an approximation to Kullback-Leibler (KL) divergence with the derivation relying on the assumption that the likelihood function has finite second derivatives. However, for phylogenetic estimation, given that tree space is discrete with respect to tree topology, the assumption of a continuous likelihood function with finite second derivatives is violated. In this paper, we investigate the relationship between the expected log likelihood of a candidate model, and the expected KL divergence in the context of phylogenetic tree estimation. We find that given the tree topology, AIC is an unbiased estimator of the expected KL divergence. However, when the tree topology is unknown, AIC tends to underestimate the expected KL divergence for phylogenetic models. Simulation results suggest that the degree of underestimation varies across phylogenetic models so that even for large sample sizes, the bias of AIC can result in selecting a wrong model. As the choice of phylogenetic models is essential for statistical phylogenetic inference, it is important to improve the accuracy of model selection criteria in the context of phylogenetics.

Keywords: AIC; Kullback-Leibler divergence; model selection; phylogenetics (search for similar items in EconPapers)
Date: 2014
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://doi.org/10.1515/sagmb-2013-0048 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bpj:sagmbi:v:13:y:2014:i:4:p:17:n:5

Ordering information: This journal article can be ordered from
https://www.degruyter.com/journal/key/sagmb/html

DOI: 10.1515/sagmb-2013-0048

Access Statistics for this article

Statistical Applications in Genetics and Molecular Biology is currently edited by Michael P. H. Stumpf

More articles in Statistical Applications in Genetics and Molecular Biology from De Gruyter
Bibliographic data for series maintained by Peter Golla ().

 
Page updated 2025-03-19
Handle: RePEc:bpj:sagmbi:v:13:y:2014:i:4:p:17:n:5