An integer linear programming approach for approximate string comparison
Marcus Ritt,
Alysson M. Costa,
Sergio Mergen and
Viviane M. Orengo
European Journal of Operational Research, 2009, vol. 198, issue 3, 706-714
Abstract:
We introduce a problem called maximum common characters in blocks (MCCB), which arises in applications of approximate string comparison, particularly in the unification of possibly erroneous textual data coming from different sources. We show that this problem is NP-complete, but can nevertheless be solved satisfactorily using integer linear programming for instances of practical interest. Two integer linear formulations are proposed and compared in terms of their linear relaxations. We also compare the results of the approximate matching with other known measures such as the Levenshtein (edit) distance.
Keywords: Approximate; string; matching; Distance; metric; Block; edits; Block; moves; Integer; programming; Hop; constraints (search for similar items in EconPapers)
Date: 2009
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0377-2217(08)00833-3
Full text for ScienceDirect subscribers only
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:ejores:v:198:y:2009:i:3:p:706-714
Access Statistics for this article
European Journal of Operational Research is currently edited by Roman Slowinski, Jesus Artalejo, Jean-Charles. Billaut, Robert Dyson and Lorenzo Peccati
More articles in European Journal of Operational Research from Elsevier
Bibliographic data for series maintained by Catherine Liu ().