Entropy Based Measurement of Text Dissimilarity for Duplicate – Detection
Venkatesh Kumar and
G. Rajendran
Modern Applied Science, 2010, vol. 4, issue 9, 142
Abstract:
The problem of identifying approximate similarity between pair of strings is an essential step for data cleansing and data integration process. Most existing approaches have relied on generic or manually tuned distance metrics for estimating the similarity potential duplicate. But existing system does not produce the similarity percentage between pair of strings. In this paper we propose a method using entropy and information gain (IG) to find dissimilarity between pair of strings to increase the accuracy of data.
Date: 2010
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://ccsenet.org/journal/index.php/mas/article/download/6541/5753 (application/pdf)
https://ccsenet.org/journal/index.php/mas/article/view/6541 (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ibn:masjnl:v:4:y:2010:i:9:p:142
Access Statistics for this article
More articles in Modern Applied Science from Canadian Center of Science and Education Contact information at EDIRC.
Bibliographic data for series maintained by Canadian Center of Science and Education ().