EconPapers    
Economics at your fingertips  
 

Examining the role of training data for supervised methods of automated record linkage: Lessons for best practice in economic history

James J Feigenbaum, Jonas Helgertz and Joseph Price

Explorations in Economic History, 2025, vol. 96, issue C

Abstract: During the past decade, scholars have produced a vast amount of research using linked historical individual-level data, shaping and changing our understanding of the past. This linked data revolution has been powered by methodological and computational advances, partly focused on supervised machine-learning methods that rely on training data. The importance of obtaining high-quality training data for the performance of the record linkage algorithm largely, however, remains unknown. This paper comprehensively examines the role of training data, and—by extension—improves our understanding of best practices in supervised methods of probabilistic record linkage. First, we compare the speed and costs of building training data using different methods. Second, we document high rates of conditional accuracy across the training data sets, rates that are especially high when built with access to more information. Third, we show that data constructed by record linking algorithms learning from different training-data-generation methods do not substantially differ in their accuracy, either overall or across demographic groups, though algorithms tend to perform best when their feature space aligns with the features used to build the training data. Lastly, we introduce errors in the training data and find that the examined record linking algorithms are remarkably capable of making accurate links even working with flawed training data.

Keywords: Historical data; Automated record linkage; Training data; Supervised record linkage; Probabilistic record linkage (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0014498325000038
Full text for ScienceDirect subscribers only

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:eee:exehis:v:96:y:2025:i:c:s0014498325000038

DOI: 10.1016/j.eeh.2025.101656

Access Statistics for this article

Explorations in Economic History is currently edited by R.H. Steckel

More articles in Explorations in Economic History from Elsevier
Bibliographic data for series maintained by Catherine Liu ().

 
Page updated 2025-03-25
Handle: RePEc:eee:exehis:v:96:y:2025:i:c:s0014498325000038