Using LLM-Generated Data to Create a Roman Urdu Scam Call Detector

Irfan, Sameed; Sheeraz, Aswad; Hasnain, Muhammad

Using LLM-Generated Data to Create a Roman Urdu Scam Call Detector

Sameed Irfan, Aswad Sheeraz and Muhammad Hasnain

Sustainable Business and Society in Emerging Economies, 2025, vol. 7, issue 3, 611-620

Abstract: Purpose: Scam calls are spreading at an alarming pace where it is estimated that the world will lose more than a thousand billion dollars in 2024. Current machine-learning systems to classify scam-calls are not yet generalized: most of such systems are only monolingual detectors, with the multilingual systems based on LLM models proving impractical because of their high computational costs. In addition, due to the high rate of innovation of scam-call strategies, most of the implemented models are obsolete. This paper aims to suggest and analyze a multilingual, cheap, and easily updateable architecture to detect scam-calls with the help of LLM-generated synthetic data.Design/Methodology/Approach: The paper presents a model that was trained purely on scam and non-scam conversations of the multilingual nature as generated by the LLM. Evaluation was done using a small human-written data of actual scam and non- scam call transcripts. The method focuses on scalability, linguistic flexibility, and speedy re-generation of data with the help of synthetic generation.Findings: The experimental results indicate that an experimental model that has been trained on synthetic data can transfer to actual scam-call data. The model, when tested on the human-written data, obtained an average score of more than 90 percent accuracy, and F1-score, proving the viability of synthetic multilingual training data, which can be used to detect scam-calls.Implications/Originality/Value: The study represents a solution to addressing the practical constraints of conventional scam-call detection systems which have linguistic and adaptability limitations. The suggested framework, based on the data produced by LLM, can provide multilingual coverage, help minimize computational costs, and update regularly, with minimal costs, thus being not only operationally viable but also able to adapt to changing scam-call tactics.

Keywords: LLM; Scam Call Detection; Machine Learning; Training models with Synthetic Data; Urdu Scam Call Detector (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://publishing.globalcsrc.org/ojs/index.php/sbsee/article/view/3496/1930 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:src:sbseec:v:7:y:2025:i:3:p:611-620

DOI: 10.26710/sbsee.v7i3.3496

Access Statistics for this article

More articles in Sustainable Business and Society in Emerging Economies from CSRC Publishing, Center for Sustainability Research and Consultancy Pakistan Contact information at EDIRC.
Bibliographic data for series maintained by Dr Rana Muhammad Adeel Farooq ().