Estimating Precision and Recall for Deterministic and Probabilistic Record Linkage
James Chipperfield,
Noel Hansen and
Peter Rossiter
International Statistical Review, 2018, vol. 86, issue 2, 219-236
Abstract:
Linking administrative, survey and census files to enhance dimensions such as time and breadth or depth of detail is now common. Because a unique person identifier is often not available, records belonging to two different units (e.g. people) may be incorrectly linked. Estimating the proportion of links that are correct, called Precision, is difficult because, even after clerical review, there will remain uncertainty about whether a link is in fact correct or incorrect. Measures of Precision are useful when deciding whether or not it is worthwhile linking two files, when comparing alternative linking strategies and as a quality measure for estimates based on the linked file. This paper proposes an estimator of Precision for a linked file that has been created by either deterministic (or rules‐based) or probabilistic (where evidence for a link being a match is weighted against the evidence that it is not a match) linkage, both of which are widely used in practice. This paper shows that the proposed estimators perform well.
Date: 2018
References: Add references at CitEc
Citations: View citations in EconPapers (2)
Downloads: (external link)
https://doi.org/10.1111/insr.12246
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:istatr:v:86:y:2018:i:2:p:219-236
Ordering information: This journal article can be ordered from
http://www.blackwell ... bs.asp?ref=0306-7734
Access Statistics for this article
International Statistical Review is currently edited by Eugene Seneta and Kees Zeelenberg
More articles in International Statistical Review from International Statistical Institute Contact information at EDIRC.
Bibliographic data for series maintained by Wiley Content Delivery ().