TextRefine: A Novel approach to improve the accuracy of LLM Models
Ekta Dalal and
Parvinder Singh
Data and Metadata, 2024, vol. 3, 331
Abstract:
Natural Language Processing (NLP) is an interdisciplinary field that investigates the fascinating world of human language with the goal of creating computational models and algorithms that can comprehend, produce, and analyze natural language in a way that is similar to humans. LLMs still encounter issues with loud and unpolished input material despite their outstanding performance in natural language processing tasks. TextRefine offers a thorough pretreatment pipeline that refines and cleans the text data before using it in LLMs to overcome this problem . The pipeline includes a number of actions, such as removing social tags, normalizing whitespace, changing all lowercase letters to uppercase, removing stopwords, fixing Unicode issues, contraction unpacking, removing punctuation and accents, and text cleanup. These procedures work together to strengthen the integrity and quality of the input data, which will ultimately improve the efficiency and precision of LLMs. Extensive testing and comparisons with standard techniques show TextRefine's effectiveness with 99 % of the accuracy
Date: 2024
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:dbk:datame:v:3:y:2024:i::p:331:id:1056294dm2024331
DOI: 10.56294/dm2024331
Access Statistics for this article
More articles in Data and Metadata from AG Editor
Bibliographic data for series maintained by Javier Gonzalez-Argote ().