An efficient context-aware agglomerative fuzzy clustering framework for plagiarism detection
Anirban Chakrabarty and
Sudipta Roy
International Journal of Data Mining, Modelling and Management, 2018, vol. 10, issue 2, 188-208
Abstract:
Plagiarism refers to the act of copying content without acknowledging the original source. Though there are several existing commercial tools for plagiarism detection, still plagiarism is tricky and challenging due to the rise in volume of online publications. Existing plagiarism detection methods use paraphrasing, sentence and key-word matching, but such techniques has not been very effective. In this work, a framework for fuzzy based plagiarism detection is proposed using a context-aware agglomerative clustering approach with an improved time complexity. The work aims in retrieving key concepts at word, sentence and paragraph level by integrating semantic features in a novel optimisation function to detect plagiarism effectively. The notion of fuzzy clustering has been applied to improve the robustness and consistency of results for clustering multi-disciplinary papers. The experimental analysis is supported by comparison with other contemporary techniques which indicate the superiority of proposed approach for plagiarism detection.
Keywords: fuzzy clustering; context similarity; plagiarism detection; spanning tree; agglomerative clustering; validity index; constrained objective function. (search for similar items in EconPapers)
Date: 2018
References: Add references at CitEc
Citations:
Downloads: (external link)
http://www.inderscience.com/link.php?id=92533 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ids:ijdmmm:v:10:y:2018:i:2:p:188-208
Access Statistics for this article
More articles in International Journal of Data Mining, Modelling and Management from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().