Improved secondary analysis of linked data: a framework and an illustration
Ray Chambers and
Andrea Diniz da Silva
Journal of the Royal Statistical Society Series A, 2020, vol. 183, issue 1, 37-59
Abstract:
Applications that use linked data are now part of mainstream social science research, though they generally do not take linkage error into consideration. Solutions that correct for the bias caused by these errors have been proposed but are not yet embedded in the various analysis procedures in common use. Secondary analyses based on linked data can therefore be potentially misleading. We review some recent approaches to non‐deterministic data linkage together with a framework for secondary analysis of the linked data which makes use of paradata produced by the linkage process to correct this bias. We also describe a new method for secondary analysis of linked data that builds on this framework and show how it can be used for estimation of a set of domain means based on linked data. We then illustrate this approach via an empirical study based on record linkage of agricultural producers in four states of Brazil aimed at producing estimates of agricultural output by industry. Our study considers register‐to‐register linkage as well as sample‐to‐register linkage, and we show results for the traditional Fellegi–Sunter approach to record linkage as well as for a newer linkage procedure based on the use of classification trees and bagging.
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (3)
Downloads: (external link)
https://doi.org/10.1111/rssa.12477
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:jorssa:v:183:y:2020:i:1:p:37-59
Ordering information: This journal article can be ordered from
http://ordering.onli ... 1111/(ISSN)1467-985X
Access Statistics for this article
Journal of the Royal Statistical Society Series A is currently edited by A. Chevalier and L. Sharples
More articles in Journal of the Royal Statistical Society Series A from Royal Statistical Society Contact information at EDIRC.
Bibliographic data for series maintained by Wiley Content Delivery ().