Large Language Models: An Applied Econometric Framework
Jens Ludwig,
Sendhil Mullainathan and
Ashesh Rambachan
No 33344, NBER Working Papers from National Bureau of Economic Research, Inc
Abstract:
How can we use the novel capacities of large language models (LLMs) in empirical research? And how can we do so while accounting for their limitations, which are themselves only poorly understood? We develop an econometric framework to answer this question that distinguishes between two types of empirical tasks. Using LLMs for prediction problems (including hypothesis generation) is valid under one condition: no “leakage” between the LLM’s training dataset and the researcher’s sample. No leakage can be ensured by using open-source LLMs with documented training data and published weights. Using LLM outputs for estimation problems to automate the measurement of some economic concept (expressed either by some text or from human subjects) requires the researcher to collect at least some validation data: without such data, the errors of the LLM’s automation cannot be assessed and accounted for. As long as these steps are taken, LLM outputs can be used in empirical research with the familiar econometric guarantees we desire. Using two illustrative applications to finance and political economy, we find that these requirements are stringent; when they are violated, the limitations of LLMs now result in unreliable empirical estimates. Our results suggest the excitement around the empirical uses of LLMs is warranted – they allow researchers to effectively use even small amounts of language data for both prediction and estimation – but only with these safeguards in place.
JEL-codes: C01 C45 (search for similar items in EconPapers)
Date: 2025-01
New Economics Papers: this item is included in nep-big and nep-cmp
Note: AP CF CH DAE DEV ED EEE EFG EH LE LS PE POL PR TWP
References: Add references at CitEc
Citations:
Downloads: (external link)
http://www.nber.org/papers/w33344.pdf (application/pdf)
Access to the full text is generally limited to series subscribers, however if the top level domain of the client browser is in a developing country or transition economy free access is provided. More information about subscriptions and free access is available at http://www.nber.org/wwphelp.html. Free access is also available to older working papers.
Related works:
Working Paper: Large Language Models: An Applied Econometric Framework (2025) 
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nbr:nberwo:33344
Ordering information: This working paper can be ordered from
http://www.nber.org/papers/w33344
The price is Paper copy available by mail.
Access Statistics for this paper
More papers in NBER Working Papers from National Bureau of Economic Research, Inc National Bureau of Economic Research, 1050 Massachusetts Avenue Cambridge, MA 02138, U.S.A.. Contact information at EDIRC.
Bibliographic data for series maintained by ().