Record Linkage using Probabilistic Methods and Data Mining Techniques
Elezaj Ogerta and
Tuxhari Gloria
Additional contact information
Tuxhari Gloria: Faculty of Economy, University of Tirana, Tirana, Albania
Mediterranean Journal of Social Sciences, 2017, vol. 8, issue 3, 203-207
Abstract:
Nowadays corporations and organizations acquire large amounts of information daily which is stored in many large databases (DB). These databases mostly are heterogeneous and the data are represented differently. Data in these DB may simply be inaccurate and there is a need to clean these DB. The record linkage process is considered to be part of the data cleaning phase when working with big scale surveys considered as a data mining step. Record linkage is an important process in data integration, which consists in finding duplication records and finding matched records too. This process can be divided in two main steps Exact Record Linkage, which founds all the exact matches between two records and Probabilistic Record Linkage, which matches records that are not exactly equal but have a high probability of being equal. In recent years, the record linkage becomes an important process in data mining task. As the databases are becoming more and more complex, finding matching records is a crucial task. Comparing each possible pair of records in large DB is impossible via manual/automatic procedures. Therefore, special algorithms (blocking methods) have to be used to reduce computational complexity of comparison space among records. The paper will discuss the deterministic and probabilistic methods used for record linkage. Also, different supervised and unsupervised techniques will be discussed. Results of a real world datasets linkage (Albanian Population and Housing Census 2011 and farmers list registered by Food Safety and Veterinary Institute) will be presented.
Keywords: Record linkage; data cleaning; data matching; blocking algorithms; data mining; data integration; clustering (search for similar items in EconPapers)
Date: 2017
References: Add references at CitEc
Citations:
Downloads: (external link)
https://doi.org/10.5901/mjss.2017.v8n3p203 (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:vrs:mjsosc:v:8:y:2017:i:3:p:203-207:n:14
DOI: 10.5901/mjss.2017.v8n3p203
Access Statistics for this article
Mediterranean Journal of Social Sciences is currently edited by Alessandro Figus
More articles in Mediterranean Journal of Social Sciences from Sciendo
Bibliographic data for series maintained by Peter Golla ().