A minimized-rule based approach for improving data currency
Mohan Li () and
Jianzhong Li ()
Additional contact information
Mohan Li: Harbin Institute of Technology
Jianzhong Li: Harbin Institute of Technology
Journal of Combinatorial Optimization, 2016, vol. 32, issue 3, No 11, 812-841
Abstract:
Abstract Repairing obsolete data items to the up-to-date values faces great challenges in the area of improving data quality. Previous methods of data repairing are based on either quality rules or statistical techniques, but both of the two types of methods have their limitations. To overcome the shortages of the previous methods, this paper focuses on combining quality rules and statistical techniques to improve data currency. (1) A new class of currency repairing rules (CRR for short) is proposed to express both domain knowledge and statistical information. Domain knowledge is expressed by the rule pattern, and the statistical information is described by the conditional probability distribution corresponding to each rule. (2) The problem of generating minimized CRRs is studied in both static and dynamic world. In the static world, the problem of generating minimized CRR patterns is proved to be NP-hard, and two approximate algorithms are provided to solve the problem. In dynamic world, methods are provided to update the CRRs without recomputing the whole CRR set in case of data being changed. In some special cases, the updates can be finished in $$O(1)$$ O ( 1 ) time. In both cases, the methods for learning conditional probabilities for each CRR pattern are provided. (3) Based on the CRRs, the problems of finding optimal repairing plans with and without cost budget is studied, and methods are provided to solve them. (4) The experiments based on both real and synthetic data sets show that the proposed methods are efficient and effective.
Keywords: Data currency; Data quality; Data cleaning; Quality rules (search for similar items in EconPapers)
Date: 2016
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s10878-015-9904-8 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:jcomop:v:32:y:2016:i:3:d:10.1007_s10878-015-9904-8
Ordering information: This journal article can be ordered from
https://www.springer.com/journal/10878
DOI: 10.1007/s10878-015-9904-8
Access Statistics for this article
Journal of Combinatorial Optimization is currently edited by Thai, My T.
More articles in Journal of Combinatorial Optimization from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().