Two‐phase sampling designs for data validation in settings with covariate measurement error and continuous outcome
Gustavo Amorim,
Ran Tao,
Sarah Lotspeich,
Pamela A. Shaw,
Thomas Lumley and
Bryan E. Shepherd
Journal of the Royal Statistical Society Series A, 2021, vol. 184, issue 4, 1368-1389
Abstract:
Measurement errors are present in many data collection procedures and can harm analyses by biasing estimates. To correct for measurement error, researchers often validate a subsample of records and then incorporate the information learned from this validation sample into estimation. In practice, the validation sample is often selected using simple random sampling (SRS). However, SRS leads to inefficient estimates because it ignores information on the error‐prone variables, which can be highly correlated to the unknown truth. Applying and extending ideas from the two‐phase sampling literature, we propose optimal and nearly optimal designs for selecting the validation sample in the classical measurement‐error framework. We target designs to improve the efficiency of model‐based and design‐based estimators, and show how the resulting designs compare to each other. Our results suggest that sampling schemes that extract more information from the error‐prone data are substantially more efficient than SRS, for both design‐ and model‐based estimators. The optimal procedure, however, depends on the analysis method, and can differ substantially. This is supported by theory and simulations. We illustrate the various designs using data from an HIV cohort study.
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1111/rssa.12689
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:jorssa:v:184:y:2021:i:4:p:1368-1389
Ordering information: This journal article can be ordered from
http://ordering.onli ... 1111/(ISSN)1467-985X
Access Statistics for this article
Journal of the Royal Statistical Society Series A is currently edited by A. Chevalier and L. Sharples
More articles in Journal of the Royal Statistical Society Series A from Royal Statistical Society Contact information at EDIRC.
Bibliographic data for series maintained by Wiley Content Delivery ().