ChatMacro: Evaluating Inflation Forecasts of Generative AI
M.Jahangir Alam (),
Shane Boyle (),
Huiyu Li and
Tatevik Sekhposyan
No 2026-04, Working Paper Series from Federal Reserve Bank of San Francisco
Abstract:
Recent research suggests that generic large language models (LLMs) can match the accuracy of traditional methods when forecasting macroeconomic variables in pseudo out-of-sample settings generated via prompts. This paper assesses the out-of-sample forecasting accuracy of LLMs by eliciting real-time forecasts of U.S. inflation from ChatGPT. We find that out-of-sample predictions are largely inaccurate and stale, even though forecasts generated in pseudo out-of-sample environments are comparable to existing benchmarks. Our results underscore the importance of out-of-sample benchmarking for LLM predictions.
Keywords: large language models; generative AI; inflation forecasting (search for similar items in EconPapers)
JEL-codes: C45 E31 E37 (search for similar items in EconPapers)
Pages: 24
Date: 2026-02-05
Note: PDF date: January 27, 2006.
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.frbsf.org/wp-content/uploads/wp2026-04.pdf PDF - view (application/pdf)
https://www.frbsf.org/research-and-insights/public ... ts-generative-of-ai/ FRBSF - view (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:fip:fedfwp:102407
Ordering information: This working paper can be ordered from
DOI: 10.24148/wp2026-04
Access Statistics for this paper
More papers in Working Paper Series from Federal Reserve Bank of San Francisco Contact information at EDIRC.
Bibliographic data for series maintained by Federal Reserve Bank of San Francisco Research Library ().