EconPapers    
Economics at your fingertips  
 

A Framework for Reconciling Attribute Values from Multiple Data Sources

Zhengrui Jiang (), Sumit Sarkar (), Prabuddha De () and Debabrata Dey ()
Additional contact information
Zhengrui Jiang: College of Business, University of North Alabama, Florence, Alabama 35632
Sumit Sarkar: School of Management, University of Texas at Dallas, Richardson, Texas 75083
Prabuddha De: Krannert School of Management, Purdue University, West Lafayette, Indiana 47907
Debabrata Dey: Michael G. Foster School of Business, University of Washington, Seattle, Washington 98195

Management Science, 2007, vol. 53, issue 12, 1946-1963

Abstract: Because of the heterogeneous nature of different data sources, data integration is often one of the most challenging tasks in managing modern information systems. While the existing literature has focused on problems such as schema integration and entity identification, it has largely overlooked a basic question: When an attribute value for a real-world entity is recorded differently in different databases, how should the "best" value be chosen from the set of possible values? This paper provides an answer to this question. We first show how a probability distribution over a set of possible values can be derived. We then demonstrate how these probabilities can be used to solve a given decision problem by minimizing the total cost of type I, type II, and misrepresentation errors. Finally, we propose a framework for integrating multiple data sources when a single "best" value has to be chosen and stored for every attribute of an entity.

Keywords: data integration; heterogeneous databases; probabilistic databases; data quality; type I error; type II error; misrepresentation error (search for similar items in EconPapers)
Date: 2007
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (4)

Downloads: (external link)
http://dx.doi.org/10.1287/mnsc.1070.0745 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:ormnsc:v:53:y:2007:i:12:p:1946-1963

Access Statistics for this article

More articles in Management Science from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().

 
Page updated 2025-03-19
Handle: RePEc:inm:ormnsc:v:53:y:2007:i:12:p:1946-1963