Plagiarism detection using stopword n‐grams
Efstathios Stamatatos
Journal of the American Society for Information Science and Technology, 2011, vol. 62, issue 12, 2512-2527
Abstract:
In this paper a novel method for detecting plagiarized passages in document collections is presented. In contrast to previous work in this field that uses content terms to represent documents, the proposed method is based on a small list of stopwords (i.e., very frequent words). We show that stopword n‐grams reveal important information for plagiarism detection since they are able to capture syntactic similarities between suspicious and original documents and they can be used to detect the exact plagiarized passage boundaries. Experimental results on a publicly available corpus demonstrate that the performance of the proposed approach is competitive when compared with the best reported results. More importantly, it achieves significantly better results when dealing with difficult plagiarism cases where the plagiarized passages are highly modified and most of the words or phrases have been replaced with synonyms.
Date: 2011
References: Add references at CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1002/asi.21630
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:jamist:v:62:y:2011:i:12:p:2512-2527
Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1532-2890
Access Statistics for this article
More articles in Journal of the American Society for Information Science and Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().