EconPapers    
Economics at your fingertips  
 

Linkage‐data linear regression

Li‐Chun Zhang and Tiziana Tuoto

Journal of the Royal Statistical Society Series A, 2021, vol. 184, issue 2, 522-547

Abstract: Data linkage is increasingly being used to combine data from different sources with the aim of identifying and bringing together records from separate files, which correspond to the same entities. Usually, data linkage is not a trivial procedure and linkage errors, false and missed links, are unavoidable. In these cases, standard statistical techniques may produce misleading inference. In this paper, we propose a method for secondary linear regression analysis, where the linked data have to be prepared by someone else, and neither the match‐key variables nor the unlinked records are available to the analyst. We develop also a diagnostic test for the assumption of non‐informative linkage errors, which is required for all existing secondary analysis adjustment methods. Our approach provides important advantages: it relies on the realistic assumption that the probabilities of correct linkage vary across the records but it does not assume that one is able to estimate the probability of correct linkage for each individual record. Moreover, it accommodates in a simple manner the general situation where the files are of different sizes and none of them is a subset of another. The proposed methodology of adjustment and testing is studied by simulation and applied to real data.

Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1111/rssa.12630

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jorssa:v:184:y:2021:i:2:p:522-547

Ordering information: This journal article can be ordered from
http://ordering.onli ... 1111/(ISSN)1467-985X

Access Statistics for this article

Journal of the Royal Statistical Society Series A is currently edited by A. Chevalier and L. Sharples

More articles in Journal of the Royal Statistical Society Series A from Royal Statistical Society Contact information at EDIRC.
Bibliographic data for series maintained by Wiley Content Delivery ().

 
Page updated 2025-03-19
Handle: RePEc:bla:jorssa:v:184:y:2021:i:2:p:522-547