Linking individuals across historical sources: A fully automated approach*
Ran Abramitzky,
Roy Mill and
Santiago Perez
Historical Methods: A Journal of Quantitative and Interdisciplinary History, 2020, vol. 53, issue 2, 94-111
Abstract:
Linking individuals across historical datasets relies on information such as name and age that is both non-unique and prone to enumeration and transcription errors. These errors make it impossible to find the correct match with certainty. In the first part of the paper, we suggest a fully automated probabilistic method for linking historical datasets that enables researchers to create samples at the frontier of minimizing type I (false positives) and type II (false negatives) errors. The first step guides researchers in the choice of which variables to use for linking. The second step uses the Expectation-Maximization (EM) algorithm, a standard tool in statistics, to compute the probability that each two records correspond to the same individual. The third step suggests how to use these estimated probabilities to choose which records to use in the analysis. In the second part of the paper, we apply the method to link historical population censuses in the US and Norway, and use these samples to estimate measures of intergenerational occupational mobility. The estimates using our method are remarkably similar to the ones using IPUMS’, which relies on hand linking to create a training sample. We created an R code and a Stata command that implement this method.
Date: 2020
References: Add references at CitEc
Citations: View citations in EconPapers (11)
Downloads: (external link)
http://hdl.handle.net/10.1080/01615440.2018.1543034 (text/html)
Access to full text is restricted to subscribers.
Related works:
Working Paper: Linking Individuals Across Historical Sources: a Fully Automated Approach (2018) 
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:taf:vhimxx:v:53:y:2020:i:2:p:94-111
Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/vhim20
DOI: 10.1080/01615440.2018.1543034
Access Statistics for this article
Historical Methods: A Journal of Quantitative and Interdisciplinary History is currently edited by J. David Hacker and Kenneth Sylvester
More articles in Historical Methods: A Journal of Quantitative and Interdisciplinary History from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().