EconPapers    
Economics at your fingertips  
 

Entropy Based Measurement of Text Dissimilarity for Duplicate – Detection

Venkatesh Kumar and G. Rajendran

Modern Applied Science, 2010, vol. 4, issue 9, 142

Abstract: The problem of identifying approximate similarity between pair of strings is an essential step for data cleansing and data integration process. Most existing approaches have relied on generic or manually tuned distance metrics for estimating the similarity potential duplicate. But existing system does not produce the similarity percentage between pair of strings. In this paper we propose a method using entropy and information gain (IG) to find dissimilarity between pair of strings to increase the accuracy of data.

Date: 2010
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://ccsenet.org/journal/index.php/mas/article/download/6541/5753 (application/pdf)
https://ccsenet.org/journal/index.php/mas/article/view/6541 (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:ibn:masjnl:v:4:y:2010:i:9:p:142

Access Statistics for this article

More articles in Modern Applied Science from Canadian Center of Science and Education Contact information at EDIRC.
Bibliographic data for series maintained by Canadian Center of Science and Education ().

 
Page updated 2025-03-19
Handle: RePEc:ibn:masjnl:v:4:y:2010:i:9:p:142