A Probabilistic Decision Model for Entity Matching in Heterogeneous Databases
Debabrata Dey,
Sumit Sarkar and
Prabuddha De
Additional contact information
Debabrata Dey: Department of Management Science, School of Business Administration, Box 353200, University of Washington, Seattle, Washington 98195-3200
Sumit Sarkar: Department of Management Science and Information Systems, School of Management, University of Texas at Dallas, Richardson, Texas 75803-0688
Prabuddha De: Department of MIS and Decision Sciences, School of Business Administration, University of Dayton, Dayton, Ohio 45469-2130
Management Science, 1998, vol. 44, issue 10, 1379-1395
Abstract:
In recent years, there has been a proliferation of database systems in all types of organizations. In many cases, these databases are developed in different departments and maintained autonomously. Much is to be gained, however, if databases across departments, divisions, or even organizations can be related to one another. One main problem of relating data stored in different databases is the differences in their representation of real-world entities, such as the use of different identifiers or primary keys. We present a decision theoretic model for matching entities across different databases. The decision to match two entities from two different databases inherently involves some uncertainty since an exact match may not be found because of errors in data collection, data entry, and data representation. We model this uncertainty using probability theory and propose an integer programming formulation that minimizes the total cost associated with the entity matching decision. The model has been implemented and validated on real-world data.
Keywords: Semantic Heterogeneity; Matching Under Uncertainty; Classification Costs; Assignment Problem (search for similar items in EconPapers)
Date: 1998
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (6)
Downloads: (external link)
http://dx.doi.org/10.1287/mnsc.44.10.1379 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:ormnsc:v:44:y:1998:i:10:p:1379-1395
Access Statistics for this article
More articles in Management Science from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().