EconPapers    
Economics at your fingertips  
 

Normalization of Malay Noisy Text in Social Media using Levenshtein Distance and Rule-Based Techniques

Azilawati Azizan, Nurkhairizan Khairuddin, Nur Husna Anuar and Rohana Ismail
Additional contact information
Azilawati Azizan: College of Computing, Informatics and Mathematics, Universiti Teknologi MARA (UiTM), Perak Branch, Tapah Campus, Malaysia.
Nurkhairizan Khairuddin: College of Computing, Informatics and Mathematics, Universiti Teknologi MARA (UiTM), Perak Branch, Tapah Campus, Malaysia.
Nur Husna Anuar: Yayasan Warisan Anak Selangor, Syarikat Pengurusan Projek TAWAS, Kompleks Belia & Kebudayaan Negeri Selangor, Shah Alam Selangor, Malaysia.
Rohana Ismail: Faculty of Informatics and Computing, Universiti Sultan Zainal Abidin, Besut Campus, Terengganu, Malaysia.

International Journal of Research and Innovation in Social Science, 2024, vol. 8, issue 9, 1535-1544

Abstract: The rise of digital communication via hand phone and Internet has led to the widespread use of short-form words and abbreviations in text messaging. This trend poses challenges for data mining activities involving text processing and analysis, particularly in social media platforms where users employ a wide variety of abbreviations, slang, misspellings, and grammatical errors. To address this challenge, this study aimed to develop an algorithm for normalizing Malay noisy text using Levenshtein Distance (LD) and rule-based techniques. The LD is used to transform Malay spelling error words into their standard form, while rule-based techniques enhanced the conversion success rate for three categories of noisy term, namely slang, common Malay noisy text, and mixed language. The project was implemented using Python programming language, which demonstrated the effectiveness of the LD and rule-based techniques in normalizing noisy text in social media. The approach successfully normalized 80% of Malay noisy text into their standard text, which provides strong foundation for further study. Furthermore, this work open opportunities for introducing new approaches and rules to improve the normalization success rate, which can facilitate the analysis of text data in social media platforms. It is recommended that future studies focus on expanding the dataset and applying statistical validation methods to ensure the robustness and accuracy of the normalization model.

Date: 2024
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.rsisinternational.org/journals/ijriss/ ... ssue-9/1535-1544.pdf (application/pdf)
https://rsisinternational.org/journals/ijriss/arti ... le-based-techniques/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bcp:journl:v:8:y:2024:i:9:p:1535-1544

Access Statistics for this article

International Journal of Research and Innovation in Social Science is currently edited by Dr. Nidhi Malhan

More articles in International Journal of Research and Innovation in Social Science from International Journal of Research and Innovation in Social Science (IJRISS)
Bibliographic data for series maintained by Dr. Pawan Verma ().

 
Page updated 2025-03-19
Handle: RePEc:bcp:journl:v:8:y:2024:i:9:p:1535-1544