EconPapers    
Economics at your fingertips  
 

A novel SMS spam dataset and bi-directional transformer based short-text representations for SMS spam detection

Srishti Maheshwari, Shubhangi Aggarwal and Rishabh Kaushal

International Journal of Information and Decision Sciences, 2024, vol. 16, issue 4, 341-359

Abstract: Short message service (SMS) is a form of exchanging short messages over mobile phones without the internet. Unfortunately, the SMS service's popularity is exploited to send irrelevant and malicious messages to entrap users into scams and frauds. In this work, we investigate the performance of state-of-the-art bi-directional encoder representations from transformers for short-text messages in SMS data. For evaluation, we curate a novel augmented SMS spam dataset by extending a classical SMS spam dataset to further categorise spam SMS messages into four fine-grained categories, namely, indecent, malicious, promotional, and updates. We perform experiments on the standard benchmark SMS dataset of spam and non-spam and on our curated multi-class SMS spam dataset. We find that BERT based short-text representations outperform the baseline traditional approach of using handcrafted text-based features by 15%-30% for different machine learning algorithms in terms of accuracy on multi-class SMS spam dataset.

Keywords: spam classification; machine learning; word embedding; representation learning; short message service; SMS. (search for similar items in EconPapers)
Date: 2024
References: Add references at CitEc
Citations:

Downloads: (external link)
http://www.inderscience.com/link.php?id=142636 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:ids:ijidsc:v:16:y:2024:i:4:p:341-359

Access Statistics for this article

More articles in International Journal of Information and Decision Sciences from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().

 
Page updated 2025-03-19
Handle: RePEc:ids:ijidsc:v:16:y:2024:i:4:p:341-359