RLPC: Record Linkage Pre-Cleaning – Technical Documentation of Routines
No 02/2015e, IWH Technical Reports from Halle Institute for Economic Research (IWH)
The primary objective of record linkage is the merger of different data sets on the basis of an unique identifier. The cases at hand are mostly company data sets from databanks with company characteristics (e.g. BvD Amadeus/Dafne), patent data sets (e.g. Patstat or DPMA) and funding data sets (e.g. BMBF funding catalog). These data sets shall be merged on the basis of the company names. Due to the fact that company names have varying notations in different databases - for example the corporate structure – a harmonization and standardization is necessary. The routines described here implement the record linkage pre-cleaning (RLPC). They are used to create record linkage compatible names (RLName) from given (actor) names (Name). This includes converting special characters to ASCII characters, identifying corporate structures, isolating and separating bracketed expressions. The result is an expression which allows for a comparison with other names. Following this pre-cleaning, record linkage systems can be used to merge several data sets that have been pretreated in the same way.
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (4) Track citations by RSS feed
Downloads: (external link)
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
Persistent link: https://EconPapers.repec.org/RePEc:zbw:iwhtrp:022015e
Access Statistics for this paper
More papers in IWH Technical Reports from Halle Institute for Economic Research (IWH) Contact information at EDIRC.
Bibliographic data for series maintained by ZBW - Leibniz Information Centre for Economics ().