Record linkage in the Cape of Good Hope Panel
Auke Rijpma (),
Jeanne Cilliers () and
Johan Fourie
Additional contact information
Auke Rijpma: Department of History, Utrecht University
Jeanne Cilliers: Department of Economic History, Lund University
No 06/2018, Working Papers from Stellenbosch University, Department of Economics
Abstract:
In this paper we describe the record linkage procedure to create a panel from Cape Colony census returns, or opgaafrolle, for 1787--1828, a dataset of 42,354 household-level observations. Based on a subset of manually linked records, we first evaluate statistical models and deterministic algorithms to best identify and match households over time. By using household-level characteristics in the linking process and near-annual data, we are able to create high-quality links for 84 percent of the dataset. We compare basic analyses on the linked panel dataset to the original cross-sectional data, evaluate the feasibility of the strategy when linking to supplementary sources, and discuss the scalability of our approach to the full Cape panel.
Keywords: census; machine learning; micro-data; record linkage; panel data; South Africa (search for similar items in EconPapers)
JEL-codes: C81 N01 (search for similar items in EconPapers)
Date: 2018
New Economics Papers: this item is included in nep-big and nep-his
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.ekon.sun.ac.za/wpapers/2018/wp062018/wp062018.pdf First version, 2018 (application/pdf)
Related works:
Working Paper: Record Linkage in the Cape of Good Hope Panel (2018) 
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:sza:wpaper:wpapers299
Access Statistics for this paper
More papers in Working Papers from Stellenbosch University, Department of Economics Contact information at EDIRC.
Bibliographic data for series maintained by Melt van Schoor ().