A Comparison of Similarity Measures for Text Documents
Shanmugasundaram Hariharan () and
Rengaramanujam Srinivasan ()
Additional contact information
Shanmugasundaram Hariharan: Faculty of Information Technology, B.S.A. Crescent Engineering College Chennai, Tamilnadu, India
Rengaramanujam Srinivasan: Faculty of Computer Science and Engineering, B.S.A. Crescent Engineering College Chennai, Tamilnadu, India
Journal of Information & Knowledge Management (JIKM), 2008, vol. 07, issue 01, 1-8
Abstract:
Similarity is an important and widely used concept in many applications such as Document Summarisation, Question Answering, Information Retrieval, Document Clustering and Categorisation. This paper presents a comparison of various similarity measures in comparing the content of text documents. We have attempted to find the best measure suited for finding the document similarity for newspaper reports.
Keywords: Stop words; stemming; normalisation; similarity measure; discriminant (search for similar items in EconPapers)
Date: 2008
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S0219649208001889
Access to full text is restricted to subscribers
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:wsi:jikmxx:v:07:y:2008:i:01:n:s0219649208001889
Ordering information: This journal article can be ordered from
DOI: 10.1142/S0219649208001889
Access Statistics for this article
Journal of Information & Knowledge Management (JIKM) is currently edited by Professor Suliman Hawamdeh
More articles in Journal of Information & Knowledge Management (JIKM) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().