EconPapers    
Economics at your fingertips  
 

USING LARGE LANGUAGE MODELS IN SHORT TEXT TOPIC MODELING: MODEL CHOICE AND SAMPLE SIZE

Shubin Yu

No mqk3r, OSF Preprints from Center for Open Science

Abstract: This study explores the efficacy of large language models (LLMs) in short-text topic modeling, comparing their performance with human evaluation and Latent Dirichlet Allocation (LDA). In Study 1, we analyzed a dataset on chatbot anthropomorphism using human evaluation, LDA, and two LLMs (GPT-4 and Claude). Results showed that LLMs produced topic classifications similar to human analysis, outperforming LDA for short texts. In Study 2, we investigated the impact of sample size and LLM choice on topic modeling consistency using a COVID-19 vaccine hesitancy dataset. Findings revealed high consistency (80-90%) across various sample sizes, with even a 5% sample achieving 90% consistency. Comparison of three LLMs (Gemini Pro 1.5, GPT-4o, and Claude 3.5 Sonnet) showed comparable performance, with two models achieving 90% consistency. This research demonstrates that LLMs can effectively perform short-text topic modeling in medical informatics, offering a promising alternative to traditional methods. The high consistency with small sample sizes suggests potential for improved efficiency in research. However, variations in performance highlight the importance of model selection and the need for human supervision in topic modeling tasks.

Date: 2024-11-01
New Economics Papers: this item is included in nep-big and nep-dcm
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://osf.io/download/6721519a5b36577c411b3229/

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:osf:osfxxx:mqk3r

DOI: 10.31219/osf.io/mqk3r

Access Statistics for this paper

More papers in OSF Preprints from Center for Open Science
Bibliographic data for series maintained by OSF ().

 
Page updated 2025-03-19
Handle: RePEc:osf:osfxxx:mqk3r