Squeezing More Out of Your Data: Business Record Linkage with Python
John Cuffe and
Nathan Goldschlag
Working Papers from U.S. Census Bureau, Center for Economic Studies
Abstract:
Integrating data from different sources has become a fundamental component of modern data analytics. Record linkage methods represent an important class of tools for accomplishing such integration. In the absence of common disambiguated identifiers, researchers often must resort to ''fuzzy" matching, which allows imprecision in the characteristics used to identify common entities across dfferent datasets. While the record linkage literature has identified numerous individually useful fuzzy matching techniques, there is little consensus on a way to integrate those techniques within a single framework. To this end, we introduce the Multiple Algorithm Matching for Better Analytics (MAMBA), an easy-to-use, flexible, scalable, and transparent software platform for business record linkage applications using Census microdata. MAMBA leverages multiple string comparators to assess the similarity of records using a machine learning algorithm to disambiguate matches. This software represents a transparent tool for researchers seeking to link external business data to the Census Business Register files.
Pages: 37 pages
Date: 2018-11
New Economics Papers: this item is included in nep-big and nep-cmp
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (6)
Downloads: (external link)
https://www2.census.gov/ces/wp/2018/CES-WP-18-46.pdf First version, 2018 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:cen:wpaper:18-46
Access Statistics for this paper
More papers in Working Papers from U.S. Census Bureau, Center for Economic Studies Contact information at EDIRC.
Bibliographic data for series maintained by Dawn Anderson ().