EconPapers    
Economics at your fingertips  
 

Asymptotic Optimality of Likelihood-Based Cross-Validation

J. van der Laan Mark, Dudoit Sandrine and Keles Sunduz
Additional contact information
J. van der Laan Mark: Division of Biostatistics, School of Public Health, University of California, Berkeley
Dudoit Sandrine: Division of Biostatistics, School of Public Health, University of California, Berkeley
Keles Sunduz: Division of Biostatistics, School of Public Health, University of California, Berkeley

Statistical Applications in Genetics and Molecular Biology, 2004, vol. 3, issue 1, 25

Abstract: Likelihood-based cross-validation is a statistical tool for selecting a density estimate based on n i.i.d. observations from the true density among a collection of candidate density estimators. General examples are the selection of a model indexing a maximum likelihood estimator, and the selection of a bandwidth indexing a nonparametric (e.g. kernel) density estimator. In this article, we establish a finite sample result for a general class of likelihood-based cross-validation procedures (as indexed by the type of sample splitting used, e.g. V-fold cross-validation). This result implies that the cross-validation selector performs asymptotically as well (w.r.t. to the Kullback-Leibler distance to the true density) as a benchmark model selector which is optimal for each given dataset and depends on the true density. Crucial conditions of our theorem are that the size of the validation sample converges to infinity, which excludes leave-one-out cross-validation, and that the candidate density estimates are bounded away from zero and infinity. We illustrate these asymptotic results and the practical performance of likelihood-based cross-validation for the purpose of bandwidth selection with a simulation study. Moreover, we use likelihood-based cross-validation in the context of regulatory motif detection in DNA sequences.

Keywords: Likelihood cross-validation; maximum likelihood estimation; Kullback-Leibler divergence; density estimation; bandwidth selection; model selection; variable selection. (search for similar items in EconPapers)
Date: 2004
References: Add references at CitEc
Citations: View citations in EconPapers (24)

Downloads: (external link)
https://doi.org/10.2202/1544-6115.1036 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bpj:sagmbi:v:3:y:2004:i:1:n:4

Ordering information: This journal article can be ordered from
https://www.degruyter.com/journal/key/sagmb/html

DOI: 10.2202/1544-6115.1036

Access Statistics for this article

Statistical Applications in Genetics and Molecular Biology is currently edited by Michael P. H. Stumpf

More articles in Statistical Applications in Genetics and Molecular Biology from De Gruyter
Bibliographic data for series maintained by Peter Golla ().

 
Page updated 2025-03-19
Handle: RePEc:bpj:sagmbi:v:3:y:2004:i:1:n:4