EconPapers    
Economics at your fingertips  
 

Improving text classification: logistic regression makes small LLMs strong and explainable ‘tens-of-shot’ classifiers

Marcus Buckmann () and Ed Hill ()
Additional contact information
Marcus Buckmann: Bank of England, Postal: Bank of England, Threadneedle Street, London, EC2R 8AH
Ed Hill: Bank of England, Postal: Bank of England, Threadneedle Street, London, EC2R 8AH

No 1127, Bank of England working papers from Bank of England

Abstract: Text classification tasks such as sentiment analysis are common in economics and finance. We demonstrate that smaller, local generative language models can be effectively used for these tasks. Compared to large commercial models, they offer key advantages in privacy, availability, cost, and explainability. We use 17 sentence classification tasks (each with 2 to 4 classes) to show that penalised logistic regression on embeddings from a small language model often matches or exceeds the performance of a large model, even when trained on just dozens of labelled examples per class – the same amount typically needed to validate a large model’s performance. Moreover, this embedding-based approach yields stable and interpretable explanations for classification decisions.

Keywords: Text classification; large language models; machine learning; embeddings; explainability (search for similar items in EconPapers)
JEL-codes: C38 C45 C80 (search for similar items in EconPapers)
Pages: 49 pages
Date: 2025-05-23
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.bankofengland.co.uk/-/media/boe/files/ ... shot-classifiers.pdf Full text (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:boe:boeewp:1127

Access Statistics for this paper

More papers in Bank of England working papers from Bank of England Bank of England, Threadneedle Street, London, EC2R 8AH. Contact information at EDIRC.
Bibliographic data for series maintained by Digital Media Team ().

 
Page updated 2025-06-18
Handle: RePEc:boe:boeewp:1127