EconPapers    
Economics at your fingertips  
 

Linking individuals across historical sources: A fully automated approach*

Ran Abramitzky, Roy Mill and Santiago Perez

Historical Methods: A Journal of Quantitative and Interdisciplinary History, 2020, vol. 53, issue 2, 94-111

Abstract: Linking individuals across historical datasets relies on information such as name and age that is both non-unique and prone to enumeration and transcription errors. These errors make it impossible to find the correct match with certainty. In the first part of the paper, we suggest a fully automated probabilistic method for linking historical datasets that enables researchers to create samples at the frontier of minimizing type I (false positives) and type II (false negatives) errors. The first step guides researchers in the choice of which variables to use for linking. The second step uses the Expectation-Maximization (EM) algorithm, a standard tool in statistics, to compute the probability that each two records correspond to the same individual. The third step suggests how to use these estimated probabilities to choose which records to use in the analysis. In the second part of the paper, we apply the method to link historical population censuses in the US and Norway, and use these samples to estimate measures of intergenerational occupational mobility. The estimates using our method are remarkably similar to the ones using IPUMS’, which relies on hand linking to create a training sample. We created an R code and a Stata command that implement this method.

Date: 2020
References: Add references at CitEc
Citations: View citations in EconPapers (11)

Downloads: (external link)
http://hdl.handle.net/10.1080/01615440.2018.1543034 (text/html)
Access to full text is restricted to subscribers.

Related works:
Working Paper: Linking Individuals Across Historical Sources: a Fully Automated Approach (2018) Downloads
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:taf:vhimxx:v:53:y:2020:i:2:p:94-111

Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/vhim20

DOI: 10.1080/01615440.2018.1543034

Access Statistics for this article

Historical Methods: A Journal of Quantitative and Interdisciplinary History is currently edited by J. David Hacker and Kenneth Sylvester

More articles in Historical Methods: A Journal of Quantitative and Interdisciplinary History from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().

 
Page updated 2025-03-22
Handle: RePEc:taf:vhimxx:v:53:y:2020:i:2:p:94-111