The Cross-Validated Adaptive Epsilon-Net Estimator
Mark van der Laan,
Sandrine Dudoit and
Aad van der Vaart
Additional contact information
Mark van der Laan: Division of Biostatistics, School of Public Health, University of California, Berkeley
Sandrine Dudoit: Division of Biostatistics, School of Public Health, University of California, Berkeley
Aad van der Vaart: Dept. of Mathematics, Vrije Universitat, Amsterdam
No 1141, U.C. Berkeley Division of Biostatistics Working Paper Series from Berkeley Electronic Press
Abstract:
Suppose that we observe a sample of independent and identically distributed realizations of a random variable. Assume that the parameter of interest can be defined as the minimizer, over a suitably defined parameter space, of the expectation (with respect to the distribution of the random variable) of a particular (loss) function of a candidate parameter value and the random variable. Examples of commonly used loss functions are the squared error loss function in regression and the negative log-density loss function in density estimation. Minimizing the empirical risk (i.e., the empirical mean of the loss function) over the entire parameter space typically results in ill-defined or too variable estimators of the parameter of interest (i.e., the risk minimizer for the true data generating distribution). In this article, we propose a cross-validated epsilon-net estimation methodology that covers a broad class of estimation problems, including multivariate outcome prediction and multivariate density estimation. An epsilon-net sieve of a subspace of the parameter space is defined as a collection of finite sets of points, the epsilon-nets indexed by epsilon, which approximate the subspace up till a resolution of epsilon. Given a collection of subspaces of the parameter space, one constructs an epsilon-net sieve for each of the subspaces. For each choice of subspace and each value of the resolution epsilon, one defines a candidate estimator as the minimizer of the empirical risk over the corresponding epsilon-net. The cross-validated epsilon-net estimator is then defined as the candidate estimator corresponding to the choice of subspace and epsilon-value minimizing the cross-validated empirical risk. We derive a finite sample inequality which proves that the proposed estimator achieves the adaptive optimal minimax rate of convergence, where the adaptivity is achieved by considering epsilon-net sieves for various subspaces. We also address the implementation of the cross-validated epsilon-net estimation procedure. In the context of a linear regression model, we present results of a preliminary simulation study comparing the cross-validated epsilon-net estimator to the cross-validated L^1-penalized least squares estimator (LASSO) and the least angle regression estimator (LARS). Finally, we discuss generalizations of the proposed estimation methodology to censored data structures.
Keywords: Adaptivity; covering number; cross-validation; epsilon-net; loss function; maximum likelihood estimation; minimax; minimum estimator; model selection; prediction; regression; risk estimation; empirical risks; sieve (search for similar items in EconPapers)
Date: 2004-07-11
Note: oai:bepress.com:ucbbiostat-1141
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (6)
Downloads: (external link)
http://www.bepress.com/cgi/viewcontent.cgi?article=1141&context=ucbbiostat (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bep:ucbbio:1141
Access Statistics for this paper
More papers in U.C. Berkeley Division of Biostatistics Working Paper Series from Berkeley Electronic Press
Bibliographic data for series maintained by Christopher F. Baum ().