Statistical Inference for Data Adaptive Target Parameters
Hubbard Alan E. (),
Kherad-Pajouh Sara and
J. van der Laan Mark
Additional contact information
Hubbard Alan E.: Division of Biostatistics, University of California, Berkeley, CA, USA
Kherad-Pajouh Sara: Division of Biostatistics, University of California, Berkeley, CA, USA
J. van der Laan Mark: Division of Biostatistics, University of California, Berkeley, CA, USA
The International Journal of Biostatistics, 2016, vol. 12, issue 1, 3-19
Abstract:
Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in an estimation sample (one of the V subsamples) and corresponding complementary parameter-generating sample. For each of the V parameter-generating samples, we apply an algorithm that maps the sample to a statistical target parameter. We define our sample-split data adaptive statistical target parameter as the average of these V-sample specific target parameters. We present an estimator (and corresponding central limit theorem) of this type of data adaptive target parameter. This general methodology for generating data adaptive target parameters is demonstrated with a number of practical examples that highlight new opportunities for statistical learning from data. This new framework provides a rigorous statistical methodology for both exploratory and confirmatory analysis within the same data. Given that more research is becoming “data-driven”, the theory developed within this paper provides a new impetus for a greater involvement of statistical inference into problems that are being increasingly addressed by clever, yet ad hoc pattern finding methods. To suggest such potential, and to verify the predictions of the theory, extensive simulation studies, along with a data analysis based on adaptively determined intervention rules are shown and give insight into how to structure such an approach. The results show that the data adaptive target parameter approach provides a general framework and resulting methodology for data-driven science.
Keywords: asymptotic linearity; clustering; cross-validation; data mining; influence curve; loss-function; risk; machine learning; sample splitting; sub-group analysis; super-learner; targeted maximum likelihood estimation (search for similar items in EconPapers)
Date: 2016
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (6)
Downloads: (external link)
https://doi.org/10.1515/ijb-2015-0013 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bpj:ijbist:v:12:y:2016:i:1:p:3-19:n:5
Ordering information: This journal article can be ordered from
https://www.degruyter.com/journal/key/ijb/html
DOI: 10.1515/ijb-2015-0013
Access Statistics for this article
The International Journal of Biostatistics is currently edited by Antoine Chambaz, Alan E. Hubbard and Mark J. van der Laan
More articles in The International Journal of Biostatistics from De Gruyter
Bibliographic data for series maintained by Peter Golla ().