Exploring Character-Based Stylometry Features Using Machine Learning for Intrinsic Plagiarism Detection in Urdu
Muhammad Faraz Manzoor ()
Additional contact information
Muhammad Faraz Manzoor: Department of Computer Science, University of Management and Technology, Lahore, Pakistan
International Journal of Innovations in Science & Technology, 2024, vol. 6 Special Issue: 7, issue 7, 236-245
Abstract:
Plagiarism detection in natural language processing (NLP) plays a crucial role in maintaining textual integrity across various domains, particularly for low-resource languages like Urdu. This study addresses the emerging challenge of intrinsic plagiarism detection in Urdu, an area with limited research due to the scarcity of datasets and model resources. To bridge this gap, our research investigates the use of character-based stylometric features in combination with machine learning (ML) and deep learning (DL) models specifically designed for Urdu text analysis.We conducted a series of experiments to evaluate the performance of several classifiers, including Random Forest, AdaBoost, K-Nearest Neighbor (KNN), Decision Tree, Gaussian Naive Bayes, and Long Short-Term Memory (LSTM) networks. Our results show that KNNand LSTM achieved the highest accuracy at 74%, with KNN outperforming the others in terms of F1-score (64.3%), highlighting its balanced performance across accuracy, precision, and recall. AdaBoost followed closely with an accuracy of 73% and a precision of 77.5%, although its F1-score was slightly lower at 63.6%. These findings emphasize the need for specialized approaches in NLP for Urdu, demonstrating that tailored ML and DL techniques can significantly improve intrinsic plagiarism detection in low-resource languages.
Keywords: Intrinsic; Plagiarism; Urdu; Stylometry (search for similar items in EconPapers)
Date: 2024
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journal.50sea.com/index.php/IJIST/article/view/1080/1648 (application/pdf)
https://journal.50sea.com/index.php/IJIST/article/view/1080 (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:abq:ijist1:v:6:y:2024:i:7:p:236-245
Access Statistics for this article
International Journal of Innovations in Science & Technology is currently edited by Prof. Dr. Syed Amer Mahmood
More articles in International Journal of Innovations in Science & Technology from 50sea
Bibliographic data for series maintained by Iqra Nazeer ().