Recovering Overlooked Information in Categorical Variables with LLMs: An Application to Labor Market Mismatch

Chen, Yi; Fang, Hanming; Zhao, Yi; Zhao, Zibo

Recovering Overlooked Information in Categorical Variables with LLMs: An Application to Labor Market Mismatch

Yi Chen, Hanming Fang, Yi Zhao and Zibo Zhao

No 32327, NBER Working Papers from National Bureau of Economic Research, Inc

Abstract: Categorical variables have no intrinsic ordering, and researchers often adopt a fixed-effect (FE) approach in empirical analysis. However, this approach has two significant limitations: it overlooks textual labels associated with the categorical variables; and it produces unstable results when there are only limited observations in a category. In this paper, we propose a novel method that utilizes recent advances in large language models (LLMs) to recover overlooked information in categorical variables. We apply this method to investigate labor market mismatch. Specifically, we task LLMs with simulating the role of a human resources specialist to assess the suitability of an applicant with specific characteristics for a given job. Our main findings can be summarized in three parts. First, using comprehensive administrative data from an online job posting platform, we show that our new match quality measure is positively correlated with several traditional measures in the literature, and at the same time, we highlight the LLM's capability to provide additional information conditional on the traditional measures. Second, we demonstrate the broad applicability of the new method with a survey data containing significantly less information than the administrative data, which makes it impossible to compute most of the traditional match quality measures. Our LLM measure successfully replicates most of the salient patterns observed in a hard-to-access administrative dataset using easily accessible survey data. Third, we investigate the gender gap in match quality and explore whether there exists gender stereotypes in the hiring process. We simulate an audit study, examining whether revealing gender information to LLMs influences their assessment. We show that when gender information is disclosed to the GPT, the model deems females better suited for traditionally female-dominated roles.

JEL-codes: C55 J16 J24 J31 (search for similar items in EconPapers)
Date: 2024-04
New Economics Papers: this item is included in nep-ain, nep-big and nep-ecm
Note: LS PE
References: Add references at CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
http://www.nber.org/papers/w32327.pdf (application/pdf)
Access to the full text is generally limited to series subscribers, however if the top level domain of the client browser is in a developing country or transition economy free access is provided. More information about subscriptions and free access is available at http://www.nber.org/wwphelp.html. Free access is also available to older working papers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nbr:nberwo:32327

Ordering information: This working paper can be ordered from
http://www.nber.org/papers/w32327
The price is Paper copy available by mail.

Access Statistics for this paper

More papers in NBER Working Papers from National Bureau of Economic Research, Inc National Bureau of Economic Research, 1050 Massachusetts Avenue Cambridge, MA 02138, U.S.A.. Contact information at EDIRC.
Bibliographic data for series maintained by ().