An Efficient Algorithm for Data Cleaning
Payal Pahwa,
Rajiv Arora and
Garima Thakur
Additional contact information
Payal Pahwa: Guru Gobind Singh IndraPrastha University, India
Rajiv Arora: Guru Gobind Singh IndraPrastha University, India
Garima Thakur: Guru Gobind Singh IndraPrastha University, India
International Journal of Knowledge-Based Organizations (IJKBO), 2011, vol. 1, issue 4, 56-71
Abstract:
The quality of real world data that is being fed into a data warehouse is a major concern of today. As the data comes from a variety of sources before loading the data in the data warehouse, it must be checked for errors and anomalies. There may be exact duplicate records or approximate duplicate records in the source data. The presence of incorrect or inconsistent data can significantly distort the results of analyses, often negating the potential benefits of information-driven approaches. This paper addresses issues related to detection and correction of such duplicate records. Also, it analyzes data quality and various factors that degrade it. A brief analysis of existing work is discussed, pointing out its major limitations. Thus, a new framework is proposed that is an improvement over the existing technique.
Date: 2011
References: Add references at CitEc
Citations:
Downloads: (external link)
http://services.igi-global.com/resolvedoi/resolve. ... 018/ijkbo.2011100104 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:igg:jkbo00:v:1:y:2011:i:4:p:56-71
Access Statistics for this article
International Journal of Knowledge-Based Organizations (IJKBO) is currently edited by John Wang
More articles in International Journal of Knowledge-Based Organizations (IJKBO) from IGI Global
Bibliographic data for series maintained by Journal Editor ().