A prior for record linkage based on allelic partitions
Brenda Betancourt,
Juan Sosa and
Abel Rodríguez
Computational Statistics & Data Analysis, 2022, vol. 172, issue C
Abstract:
In database management, record linkage aims to identify multiple records that correspond to the same individual. Record linkage can be treated as a clustering problem in which one or more noisy database records are associated with a unique latent entity. In contrast to traditional clustering applications, a large number of clusters with a few observations per cluster is expected in this context. Hence, a new class of prior distributions based on allelic partitions is proposed for the small cluster setting of record linkage. The proposed prior facilitates the introduction of information about the cluster size distribution at different scales, and naturally enforces sublinear growth of the maximum cluster size – known as the microclustering property. In addition, a set of novel microclustering conditions are introduced in order to impose further constraints on the cluster sizes a priori. The performance of the proposed class of priors is evaluated using simulated data and three official statistics data sets. Moreover, different loss functions for optimal point estimation of the partitions are compared using decision-theoretical based approaches recently proposed in the literature.
Keywords: Microclustering; Allelic partitions; Record linkage (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0167947322000548
Full text for ScienceDirect subscribers only.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:csdana:v:172:y:2022:i:c:s0167947322000548
DOI: 10.1016/j.csda.2022.107474
Access Statistics for this article
Computational Statistics & Data Analysis is currently edited by S.P. Azen
More articles in Computational Statistics & Data Analysis from Elsevier
Bibliographic data for series maintained by Catherine Liu ().