LABOR-LLM: Language-Based Occupational Representations with Large Language Models

Athey, Susan; Brunborg, Herman; Du, Tianyu; Kanodia, Ayush; Vafa, Keyon

LABOR-LLM: Language-Based Occupational Representations with Large Language Models

Susan Athey, Herman Brunborg, Tianyu Du, Ayush Kanodia and Keyon Vafa

Abstract: Vafa et al. (2024) introduced a transformer-based econometric model, CAREER, that predicts a worker's next job as a function of career history (an "occupation model"). CAREER was initially estimated ("pre-trained") using a large, unrepresentative resume dataset, which served as a "foundation model," and parameter estimation was continued ("fine-tuned") using data from a representative survey. CAREER had better predictive performance than benchmarks. This paper considers an alternative where the resume-based foundation model is replaced by a large language model (LLM). We convert tabular data from the survey into text files that resemble resumes and fine-tune the LLMs using these text files with the objective to predict the next token (word). The resulting fine-tuned LLM is used as an input to an occupation model. Its predictive performance surpasses all prior models. We demonstrate the value of fine-tuning and further show that by adding more career data from a different population, fine-tuning smaller LLMs surpasses the performance of fine-tuning larger models.

Date: 2024-06, Revised 2025-02
New Economics Papers: this item is included in nep-ain, nep-big, nep-cmp and nep-mac
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://arxiv.org/pdf/2406.17972 Latest version (application/pdf)

Related works:
Working Paper: Labor-LLM: Language-Based Occupational Representations with Large Language Models (2024)
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2406.17972

Access Statistics for this paper

More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().