Large Language Models: An Applied Econometric Framework

Ludwig, Jens; Mullainathan, Sendhil; Rambachan, Ashesh

Large Language Models: An Applied Econometric Framework

Jens Ludwig, Sendhil Mullainathan and Ashesh Rambachan

Abstract: Large language models (LLMs) enable researchers to analyze text at unprecedented scale and minimal cost. Researchers can now revisit old questions and tackle novel ones with rich data. We provide an econometric framework for realizing this potential in two empirical uses. For prediction problems -- forecasting outcomes from text -- valid conclusions require ``no training leakage'' between the LLM's training data and the researcher's sample, which can be enforced through careful model choice and research design. For estimation problems -- automating the measurement of economic concepts for downstream analysis -- valid downstream inference requires combining LLM outputs with a small validation sample to deliver consistent and precise estimates. Absent a validation sample, researchers cannot assess possible errors in LLM outputs, and consequently seemingly innocuous choices (which model, which prompt) can produce dramatically different parameter estimates. When used appropriately, LLMs are powerful tools that can expand the frontier of empirical economics.

Date: 2024-12, Revised 2025-12
New Economics Papers: this item is included in nep-ain, nep-big, nep-cmp and nep-ecm
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (10)

Downloads: (external link)
http://arxiv.org/pdf/2412.07031 Latest version (application/pdf)

Related works:
Working Paper: Large Language Models: An Applied Econometric Framework (2025)
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2412.07031

Access Statistics for this paper

More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().