EconPapers    
Economics at your fingertips  
 

Record Matching in Data Warehouses: A Decision Model for Data Consolidation

Debabrata Dey ()
Additional contact information
Debabrata Dey: University of Washington Business School, Seattle, Washington 98195--3200

Operations Research, 2003, vol. 51, issue 2, 240-254

Abstract: The notion of a data warehouse for integrating operational data into a single repository is rapidly becoming popular in modern organizations. An important issue in the integration process is how to deal with the identifier mismatch problem when combining similar data from disparate sources. A real-world entity may be represented using different identifiers in different operational data sources, and matching them may often be difficult using simple database operations expressed, say, as an SQL query. A record-by-record manual matching is also not practical because the databases may be large. A decision model is presented that combines probability-based automated matching with manual matching in a cost minimization formulation. A heuristic approach is proposed for solving the decision model. Both the model and the heuristic solution approach have been tested on real data. The results from the testing indicate that the model can be effectively used in real-world situations.

Keywords: Computers; databases: data warehousing; data consolidation; Information systems; decision-support systems: record matching; Programming; integer: algorithms; heuristic systems (search for similar items in EconPapers)
Date: 2003
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (4)

Downloads: (external link)
http://dx.doi.org/10.1287/opre.51.2.240.12779 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:51:y:2003:i:2:p:240-254

Access Statistics for this article

More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().

 
Page updated 2025-03-19
Handle: RePEc:inm:oropre:v:51:y:2003:i:2:p:240-254