EconPapers    
Economics at your fingertips  
 

An integer linear programming approach for approximate string comparison

Marcus Ritt, Alysson M. Costa, Sergio Mergen and Viviane M. Orengo

European Journal of Operational Research, 2009, vol. 198, issue 3, 706-714

Abstract: We introduce a problem called maximum common characters in blocks (MCCB), which arises in applications of approximate string comparison, particularly in the unification of possibly erroneous textual data coming from different sources. We show that this problem is NP-complete, but can nevertheless be solved satisfactorily using integer linear programming for instances of practical interest. Two integer linear formulations are proposed and compared in terms of their linear relaxations. We also compare the results of the approximate matching with other known measures such as the Levenshtein (edit) distance.

Keywords: Approximate; string; matching; Distance; metric; Block; edits; Block; moves; Integer; programming; Hop; constraints (search for similar items in EconPapers)
Date: 2009
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0377-2217(08)00833-3
Full text for ScienceDirect subscribers only

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:eee:ejores:v:198:y:2009:i:3:p:706-714

Access Statistics for this article

European Journal of Operational Research is currently edited by Roman Slowinski, Jesus Artalejo, Jean-Charles. Billaut, Robert Dyson and Lorenzo Peccati

More articles in European Journal of Operational Research from Elsevier
Bibliographic data for series maintained by Catherine Liu ().

 
Page updated 2025-03-19
Handle: RePEc:eee:ejores:v:198:y:2009:i:3:p:706-714