Testing the cognitive limits of large language models

Perez-Cruz, Fernando; Shin, Hyun Song

Testing the cognitive limits of large language models

Fernando Perez-Cruz and Hyun Song Shin

No 83, BIS Bulletins from Bank for International Settlements

Abstract: When posed with a logical puzzle that demands reasoning about the knowledge of others and about counterfactuals, large language models (LLMs) display a distinctive and revealing pattern of failure. The LLM performs flawlessly when presented with the original wording of the puzzle available on the internet but performs poorly when incidental details are changed, suggestive of a lack of true understanding of the underlying logic. Our findings do not detract from the considerable progress in central bank applications of machine learning to data management, macro analysis and regulation/supervision. They do, however, suggest that caution should be exercised in deploying LLMs in contexts that demand rigorous reasoning in economic analysis.

Pages: 9 pages
Date: 2024-01-04
New Economics Papers: this item is included in nep-ain, nep-big, nep-cmp and nep-neu
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.bis.org/publ/bisbull83.pdf Full PDF document (application/pdf)
https://www.bis.org/publ/bisbull83.htm (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bis:bisblt:83

Access Statistics for this paper

More papers in BIS Bulletins from Bank for International Settlements Contact information at EDIRC.
Bibliographic data for series maintained by Martin Fessler ().