Economics at your fingertips  

Combining Family History and Machine Learning to Link Historical Records

Joseph Price (), Kasey Buckles (), Jacob Van Leeuwen and Isaac Riley

No 26227, NBER Working Papers from National Bureau of Economic Research, Inc

Abstract: A key challenge for research on many questions in the social sciences is that it is difficult to link historical records in a way that allows investigators to observe people at different points in their life or across generations. In this paper, we develop a new approach that relies on millions of record links created by individual contributors to a large, public, wiki-style family tree. First, we use these “true” links to inform the decisions one needs to make when using traditional linking methods. Second, we use the links to construct a training data set for use in supervised machine learning methods. We describe the procedure we use and illustrate the potential of our approach by linking individuals across the 100% samples of the US decennial censuses from 1900, 1910, and 1920. We obtain an overall match rate of about 70 percent, with a false positive rate of about 12 percent. This combination of high match rate and accuracy represents a point beyond the current frontier for record linking methods.

JEL-codes: C81 J1 N01 (search for similar items in EconPapers)
Date: 2019-09
New Economics Papers: this item is included in nep-big, nep-cmp, nep-his and nep-pay
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1) Track citations by RSS feed

Downloads: (external link) (application/pdf)
Access to the full text is generally limited to series subscribers, however if the top level domain of the client browser is in a developing country or transition economy free access is provided. More information about subscriptions and free access is available at Free access is also available to older working papers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link:

Ordering information: This working paper can be ordered from
The price is Paper copy available by mail.

Access Statistics for this paper

More papers in NBER Working Papers from National Bureau of Economic Research, Inc National Bureau of Economic Research, 1050 Massachusetts Avenue Cambridge, MA 02138, U.S.A.. Contact information at EDIRC.
Bibliographic data for series maintained by ().

Page updated 2020-03-29
Handle: RePEc:nbr:nberwo:26227