EconPapers    
Economics at your fingertips  
 

TextRefine: A Novel approach to improve the accuracy of LLM Models

Ekta Dalal and Parvinder Singh

Data and Metadata, 2024, vol. 3, 331

Abstract: Natural Language Processing (NLP) is an interdisciplinary field that investigates the fascinating world of human language with the goal of creating computational models and algorithms that can comprehend, produce, and analyze natural language in a way that is similar to humans. LLMs still encounter issues with loud and unpolished input material despite their outstanding performance in natural language processing tasks. TextRefine offers a thorough pretreatment pipeline that refines and cleans the text data before using it in LLMs to overcome this problem . The pipeline includes a number of actions, such as removing social tags, normalizing whitespace, changing all lowercase letters to uppercase, removing stopwords, fixing Unicode issues, contraction unpacking, removing punctuation and accents, and text cleanup. These procedures work together to strengthen the integrity and quality of the input data, which will ultimately improve the efficiency and precision of LLMs. Extensive testing and comparisons with standard techniques show TextRefine's effectiveness with 99 % of the accuracy

Date: 2024
References: Add references at CitEc
Citations:

There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:dbk:datame:v:3:y:2024:i::p:331:id:1056294dm2024331

DOI: 10.56294/dm2024331

Access Statistics for this article

More articles in Data and Metadata from AG Editor
Bibliographic data for series maintained by Javier Gonzalez-Argote ().

 
Page updated 2025-09-21
Handle: RePEc:dbk:datame:v:3:y:2024:i::p:331:id:1056294dm2024331