Extending the Fellegi-Sunter record linkage model for mixed-type data with application to the French national health data system
Thanh Huan Vo,
Guillaume Chauvet,
André Happe,
Emmanuel Oger,
Stéphane Paquelet and
Valérie Garès
Computational Statistics & Data Analysis, 2023, vol. 179, issue C
Abstract:
Probabilistic record linkage is a process of combining data from different sources, when such data refer to common entities and identifying information is not available. A probabilistic record linkage framework that takes into account multiple non-identifying information that this is limited to simple binary comparison between matching variables has been previously proposed. An extension of this method is proposed for mixed-type comparison vectors. A mixture model for handling comparison values of low prevalence categorical matching variables, and a mixture of hurdle gamma distribution for handling comparison values of continuous matching variables have been developed. The parameters are estimated by means of the Expectation Conditional Maximization (ECM) algorithm. Through a Monte Carlo simulation study, both the posterior probability estimation for a record pair to be a match and the prediction of matched record pairs are evaluated. The simulation results indicate that the proposed methods outperform existing ones in most considered cases. The proposed methods are applied on a real dataset, to perform linkage between a registry of patients suffering from venous thromboembolism in the Brest district area (GETBO) and the French national health information system (SNDS).
Keywords: Expectation Conditional Maximization (ECM) algorithm; Hurdle gamma distribution; Low prevalence variables; Mixture model; Probabilistic record linkage (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0167947322002365
Full text for ScienceDirect subscribers only.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:csdana:v:179:y:2023:i:c:s0167947322002365
DOI: 10.1016/j.csda.2022.107656
Access Statistics for this article
Computational Statistics & Data Analysis is currently edited by S.P. Azen
More articles in Computational Statistics & Data Analysis from Elsevier
Bibliographic data for series maintained by Catherine Liu ().