EconPapers    
Economics at your fingertips  
 

Theoretical limits of microclustering for record linkage

J E Johndrow, K Lum and D B Dunson

Biometrika, 2018, vol. 105, issue 2, 431-446

Abstract: SUMMARYThere has been substantial recent interest in record linkage, where one attempts to group the records pertaining to the same entities from one or more large databases that lack unique identifiers. This can be viewed as a type of microclustering, with few observations per cluster and a very large number of clusters. We show that the problem is fundamentally hard from a theoretical perspective and, even in idealized cases, accurate entity resolution is effectively impossible unless the number of entities is small relative to the number of records and/or the separation between records from different entities is extremely large. These results suggest conservatism in interpretation of the results of record linkage, support collection of additional data to more accurately disambiguate the entities, and motivate a focus on coarser inference. For example, results from a simulation study suggest that sometimes one may obtain accurate results for population size estimation even when fine-scale entity resolution is inaccurate.

Keywords: Closed population estimation; Clustering; Entity resolution; Microclustering; Record linkage; Small clusters (search for similar items in EconPapers)
Date: 2018
References: Add references at CitEc
Citations: View citations in EconPapers (4)

Downloads: (external link)
http://hdl.handle.net/10.1093/biomet/asy003 (application/pdf)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:oup:biomet:v:105:y:2018:i:2:p:431-446.

Ordering information: This journal article can be ordered from
https://academic.oup.com/journals

Access Statistics for this article

Biometrika is currently edited by Paul Fearnhead

More articles in Biometrika from Biometrika Trust Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, UK.
Bibliographic data for series maintained by Oxford University Press ().

 
Page updated 2025-03-19
Handle: RePEc:oup:biomet:v:105:y:2018:i:2:p:431-446.