Combining family history and machine learning to link historical records: The Census Tree data set
Joseph Price,
Kasey Buckles,
Jacob Van Leeuwen and
Isaac Riley
Explorations in Economic History, 2021, vol. 80, issue C
Abstract:
A key challenge for research on many questions in the social sciences is that it is difficult to link records in a way that allows investigators to observe people at different points in their life or across generations. In this paper, we contribute to recent efforts to create these links with a new approach that relies on millions of record links created by individual contributors to a large, public, wiki-style family tree. We use these “true” links both to inform the decisions one needs to make when using automated methods to link records and as a training data set for use in a supervised machine learning approach. We describe our procedure and illustrate its potential by linking individuals across the 100% samples of the US censuses from 1900, 1910, and 1920. When linking adjacent censuses, we obtain an overall match rate of 62-65 percent (for over 88.9 million matches), with a false positive rate that is around 6-7 percent and with links that are similar to the population along observable characteristics. Thus, our method allows us to link records with a combination of a high match rate, precision, and representativeness that is beyond the current frontier. Finally, we demonstrate the potential of the data by estimating the degree of intergenerational transmission of literacy between father-son and mother-daughter pairs.
Keywords: Record linking; Genealogy data; Machine learning; Intergenerational transmission (search for similar items in EconPapers)
JEL-codes: C8 N01 N11 N12 (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (13)
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0014498321000024
Full text for ScienceDirect subscribers only
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:exehis:v:80:y:2021:i:c:s0014498321000024
DOI: 10.1016/j.eeh.2021.101391
Access Statistics for this article
Explorations in Economic History is currently edited by R.H. Steckel
More articles in Explorations in Economic History from Elsevier
Bibliographic data for series maintained by Catherine Liu ().