EconPapers    
Economics at your fingertips  
 

Grammatical versus Spelling Error Correction: An Investigation into the Responsiveness of Transformer-Based Language Models Using BART and MarianMT

Rohit Raju, Peeta Basa Pati (), Gandheesh Sa, Gayatri Sanjana Sannala and Suriya Ks
Additional contact information
Rohit Raju: Department of Computer Science, University of Colorado, Boulder, CO, USA†Department of Computer Science & Engineering, Amrita School of Computing, Bengaluru, Amrita Vishwa Vidyapeetham, India
Peeta Basa Pati: ��Department of Computer Science & Engineering, Amrita School of Computing, Bengaluru, Amrita Vishwa Vidyapeetham, India
Gandheesh Sa: ��Department of Computer Science & Engineering, Amrita School of Computing, Bengaluru, Amrita Vishwa Vidyapeetham, India
Gayatri Sanjana Sannala: ��Department of Computer Science & Engineering, Amrita School of Computing, Bengaluru, Amrita Vishwa Vidyapeetham, India
Suriya Ks: ��Department of Computer Science & Engineering, Amrita School of Computing, Bengaluru, Amrita Vishwa Vidyapeetham, India

Journal of Information & Knowledge Management (JIKM), 2024, vol. 23, issue 03, 1-33

Abstract: Text continues to remain a relevant form of representation for information. Text documents are created either in digital native platforms or through the conversion of other media files such as images and speech. While the digital native text is invariably obtained through physical or virtual keyboards, technologies such as OCR and speech recognition are utilised to transform the images and speech signals into text content. All these variety of mechanisms of text generation also introduce errors into the captured text. This project aims at analysing different kinds of errors that occur in text documents. The work employs two of the advanced deep neural network-based language models, namely, BART and MarianMT, to rectify the anomalies present in the text. Transfer learning of these models with available dataset is performed to finetune their capacity for error correction. A comparative study is conducted to investigate the effectiveness of these models in handling each of the defined error categories. It is observed that while both models can bring down the erroneous sentences by 20+%, BART can handle spelling errors far better (24.6%) than grammatical errors (8.8%).

Keywords: BART; MarianMT; text enhancement; spelling error correction; error category (search for similar items in EconPapers)
Date: 2024
References: Add references at CitEc
Citations:

Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S0219649224500370
Access to full text is restricted to subscribers

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:wsi:jikmxx:v:23:y:2024:i:03:n:s0219649224500370

Ordering information: This journal article can be ordered from

DOI: 10.1142/S0219649224500370

Access Statistics for this article

Journal of Information & Knowledge Management (JIKM) is currently edited by Professor Suliman Hawamdeh

More articles in Journal of Information & Knowledge Management (JIKM) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().

 
Page updated 2025-03-20
Handle: RePEc:wsi:jikmxx:v:23:y:2024:i:03:n:s0219649224500370