EconPapers    
Economics at your fingertips  
 

Measuring quality of similarity functions in approximate data matching

Roberto da Silva, Raquel Stasiu, Viviane Moreira Orengo and Carlos A. Heuser

Journal of Informetrics, 2007, vol. 1, issue 1, 35-46

Abstract: This paper presents a method for assessing the quality of similarity functions. The scenario taken into account is that of approximate data matching, in which it is necessary to determine whether two data instances represent the same real world object. Our method is based on the semi-automatic estimation of optimal threshold values. We propose two methods for performing such estimation. The first method is an algorithm based on a reward function, and the second is a statistical method. Experiments were carried out to validate the techniques proposed. The results show that both methods for threshold estimation produce similar results. The output of such methods was used to design a grading function for similarity functions. This grading function, called discernability, was used to compare a number of similarity functions applied to an experimental data set.

Keywords: Approximate data matching; Similarity functions; Retrieval evaluation (search for similar items in EconPapers)
Date: 2007
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S175115770600006X
Full text for ScienceDirect subscribers only

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:eee:infome:v:1:y:2007:i:1:p:35-46

DOI: 10.1016/j.joi.2006.09.001

Access Statistics for this article

Journal of Informetrics is currently edited by Leo Egghe

More articles in Journal of Informetrics from Elsevier
Bibliographic data for series maintained by Catherine Liu ().

 
Page updated 2025-03-19
Handle: RePEc:eee:infome:v:1:y:2007:i:1:p:35-46