Confidence bands for a distribution function with merged data from multiple sources
Saegusa Takumi ()
Additional contact information
Saegusa Takumi: University of Maryland, ; Maryland, ; United States
Statistics in Transition New Series, 2020, vol. 21, issue 4, 144-158
Abstract:
We consider nonparametric estimation of a distribution function when data are collected from multiple overlapping data sources. Main statistical challenges include (1) heterogeneity of data sets, (2) unidentified duplicated records across data sets, and (3) dependence due to sampling without replacement from a data source. The proposed estimator is computable without identifying duplication but corrects bias from duplicated records. We show the uniform consistency of the proposed estimator over the real line and its weak convergence to a Gaussian process. Based on these asymptotic properties, we propose a simulation-based confidence band that enjoys asymptotically correct coverage probability. The finite sample performance is evaluated through a simulation study. A Wilms tumor example is provided.
Keywords: confidence band; data integration; Gaussian process. (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://doi.org/10.21307/stattrans-2020-035 (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:vrs:stintr:v:21:y:2020:i:4:p:144-158:n:15
DOI: 10.21307/stattrans-2020-035
Access Statistics for this article
Statistics in Transition New Series is currently edited by Włodzimierz Okrasa
More articles in Statistics in Transition New Series from Statistics Poland
Bibliographic data for series maintained by Peter Golla ().