EconPapers    
Economics at your fingertips  
 

Using a Probabilistic Model to Assist Merging of Large-Scale Administrative Records

Ted Enamorado, Benjamin Fifield and Kosuke Imai

American Political Science Review, 2019, vol. 113, issue 2, 353-371

Abstract: Since most social science research relies on multiple data sources, merging data sets is an essential part of researchers’ workflow. Unfortunately, a unique identifier that unambiguously links records is often unavailable, and data may contain missing and inaccurate information. These problems are severe especially when merging large-scale administrative records. We develop a fast and scalable algorithm to implement a canonical model of probabilistic record linkage that has many advantages over deterministic methods frequently used by social scientists. The proposed methodology efficiently handles millions of observations while accounting for missing data and measurement error, incorporating auxiliary information, and adjusting for uncertainty about merging in post-merge analyses. We conduct comprehensive simulation studies to evaluate the performance of our algorithm in realistic scenarios. We also apply our methodology to merging campaign contribution records, survey data, and nationwide voter files. An open-source software package is available for implementing the proposed methodology.

Date: 2019
References: Add references at CitEc
Citations: View citations in EconPapers (8)

Downloads: (external link)
https://www.cambridge.org/core/product/identifier/ ... type/journal_article link to article abstract page (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:cup:apsrev:v:113:y:2019:i:02:p:353-371_00

Access Statistics for this article

More articles in American Political Science Review from Cambridge University Press Cambridge University Press, UPH, Shaftesbury Road, Cambridge CB2 8BS UK.
Bibliographic data for series maintained by Kirk Stebbing ().

 
Page updated 2025-03-19
Handle: RePEc:cup:apsrev:v:113:y:2019:i:02:p:353-371_00