Recovering Overlooked Information in Categorical Variables with LLMs: An Application to Labor Market Mismatch

Chen, Yi; Fang, Hanming; Zhao, Yi; Zhao, Zibo

Recovering Overlooked Information in Categorical Variables with LLMs: An Application to Labor Market Mismatch

Yi Chen (), Hanming Fang (), Yi Zhao () and Zibo Zhao ()
Additional contact information
Yi Chen: ShanghaiTech University
Hanming Fang: University of Pennsylvania
Yi Zhao: Tsinghua University
Zibo Zhao: ShanghaiTech University

PIER Working Paper Archive from Penn Institute for Economic Research, Department of Economics, University of Pennsylvania

Abstract: Categorical variables have no intrinsic ordering, and researchers often adopt a fixed-effect (FE) approach in empirical analysis. However, this approach has two significant limitations: it overlooks textual information associated with the categorical variables; and it produces unstable results when there are only limited observations in a category. In this paper, we propose a novel method that utilizes recent advances in large language models (LLMs) to recover overlooked information in categorical variables. We apply this method to investigate labor market mismatch. Specifically, we task LLMs with simulating the role of a human resources specialist to assess the suitability of an applicant with specific characteristics for a given job. Our main findings can be summarized in three parts. First, using comprehensive administrative data from an online job posting platform, we show that our new match quality measure is positively correlated with several traditional measures in the literature, and we highlight the LLM’s capability to provide additional information beyond that contained in the traditional measures. Second, we demonstrate the broad applicability of the new method with a survey data containing significantly less information than the administrative data, which makes it impossible to compute most of the traditional match quality measures. Our LLM measure successfully replicates most of the salient patterns observed in a hard-to-access administrative dataset using easily accessible survey data. Third, we investigate the gender gap in match quality and explore whether there exists gender stereotypes in the hiring process. We simulate an audit study, examining whether revealing gender information to LLMs influences their assessment. We show that when gender information is disclosed to the LLMs, the model deems females better suited for traditionally female-dominated roles.

Keywords: Large Language Models; Categorical Variables; Labor Market Mismatch (search for similar items in EconPapers)
JEL-codes: C55 J16 J24 J31 (search for similar items in EconPapers)
Pages: 48 pages
Date: 2024-07-23
New Economics Papers: this item is included in nep-ain, nep-big and nep-lma
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
https://economics.sas.upenn.edu/system/files/worki ... per%20Submission.pdf (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:pen:papers:24-017

Access Statistics for this paper

More papers in PIER Working Paper Archive from Penn Institute for Economic Research, Department of Economics, University of Pennsylvania 133 South 36th Street, Philadelphia, PA 19104. Contact information at EDIRC.
Bibliographic data for series maintained by Administrator ().