EconPapers    
Economics at your fingertips  
 

A Comparison of Similarity Measures for Text Documents

Shanmugasundaram Hariharan () and Rengaramanujam Srinivasan ()
Additional contact information
Shanmugasundaram Hariharan: Faculty of Information Technology, B.S.A. Crescent Engineering College Chennai, Tamilnadu, India
Rengaramanujam Srinivasan: Faculty of Computer Science and Engineering, B.S.A. Crescent Engineering College Chennai, Tamilnadu, India

Journal of Information & Knowledge Management (JIKM), 2008, vol. 07, issue 01, 1-8

Abstract: Similarity is an important and widely used concept in many applications such as Document Summarisation, Question Answering, Information Retrieval, Document Clustering and Categorisation. This paper presents a comparison of various similarity measures in comparing the content of text documents. We have attempted to find the best measure suited for finding the document similarity for newspaper reports.

Keywords: Stop words; stemming; normalisation; similarity measure; discriminant (search for similar items in EconPapers)
Date: 2008
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S0219649208001889
Access to full text is restricted to subscribers

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:wsi:jikmxx:v:07:y:2008:i:01:n:s0219649208001889

Ordering information: This journal article can be ordered from

DOI: 10.1142/S0219649208001889

Access Statistics for this article

Journal of Information & Knowledge Management (JIKM) is currently edited by Professor Suliman Hawamdeh

More articles in Journal of Information & Knowledge Management (JIKM) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().

 
Page updated 2025-03-20
Handle: RePEc:wsi:jikmxx:v:07:y:2008:i:01:n:s0219649208001889