Source Code Plagiarism Detection Using Biological String Similarity Algorithms
Imad Rahal () and
Colin Wielga ()
Additional contact information
Imad Rahal: Department of Computer Science, College of Saint Benedict and Saint John's University, Collegeville, MN 56321, USA
Colin Wielga: Natural Sciences Division, New College of Florida, Sarasota, FL 34234, USA
Journal of Information & Knowledge Management (JIKM), 2014, vol. 13, issue 03, 1-22
Abstract:
Source code plagiarism is easy to commit but difficult to catch. Many approaches have been proposed in the literature to automate its detection; however there is little consensus on what works best. In this paper, we propose two new measures for determining the accuracy of a given technique and describe an approach to convert code files into strings which can then be compared for similarity in order to detect plagiarism. We then compare several string comparison techniques, heavily utilised in the area of biological sequence alignment, and compare their performance on a large collection of student source code containing various types of plagiarism. Experimental results show that the compared techniques succeed in matching a plagiarised file to its original files upwards of 90% of the time. Finally, we propose a modification for these algorithms that drastically improves their runtimes with little or no effect on accuracy. Even though the ideas presented herein are applicable to most programming languages, we focus on a case study pertaining to an introductory-level Visual Basic programming course offered at our institution.
Keywords: Source code similarity; plagiarism detection; sequence alignment; string tiling (search for similar items in EconPapers)
Date: 2014
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S0219649214500282
Access to full text is restricted to subscribers
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:wsi:jikmxx:v:13:y:2014:i:03:n:s0219649214500282
Ordering information: This journal article can be ordered from
DOI: 10.1142/S0219649214500282
Access Statistics for this article
Journal of Information & Knowledge Management (JIKM) is currently edited by Professor Suliman Hawamdeh
More articles in Journal of Information & Knowledge Management (JIKM) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().