Correlates of record linkage and estimating risks of non‐linkage biases in business data sets
Jamie C. Moore,
Peter W. F. Smith and
Gabriele B. Durrant
Journal of the Royal Statistical Society Series A, 2018, vol. 181, issue 4, 1211-1230
Researchers often utilize data sets that link information from multiple sources, but non‐linkage biases caused by linked and non‐linked subject differences are little understood, especially in business data sets. We address these knowledge gaps by studying biases in linkable 2010 UK Small Business Survey data sets. We identify correlates of business linkage propensity, and also for the first time its components: consent to linkage and register identifier appendability. As well, we take a novel approach to evaluating non‐linkage bias risks, by computing data set representativeness indicators (comparable, decomposable sample subset similarity measures). We find that the main impacts on linkage propensities and bias risks are due to consenter–non‐consenter differences explicable given business survey response processes, and differences between subjects with and without identifiers caused by register undercoverage of very small businesses. We then discuss consequences for the analysis of linked business data sets, and implications of the evaluation methods we introduce for linked data set producers and users.
References: View references in EconPapers View complete reference list from CitEc
Citations: Track citations by RSS feed
Downloads: (external link)
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
Persistent link: https://EconPapers.repec.org/RePEc:bla:jorssa:v:181:y:2018:i:4:p:1211-1230
Ordering information: This journal article can be ordered from
http://ordering.onli ... 1111/(ISSN)1467-985X
Access Statistics for this article
Journal of the Royal Statistical Society Series A is currently edited by A. Chevalier and L. Sharples
More articles in Journal of the Royal Statistical Society Series A from Royal Statistical Society Contact information at EDIRC.
Bibliographic data for series maintained by Wiley Content Delivery ().