Towards reducing hallucination in extracting information from financial reports using Large Language Models

Sarmah, Bhaskarjit; Zhu, Tianjie; Mehta, Dhagash; Pasquali, Stefano

Towards reducing hallucination in extracting information from financial reports using Large Language Models

Bhaskarjit Sarmah, Tianjie Zhu, Dhagash Mehta and Stefano Pasquali

Abstract: For a financial analyst, the question and answer (Q\&A) segment of the company financial report is a crucial piece of information for various analysis and investment decisions. However, extracting valuable insights from the Q\&A section has posed considerable challenges as the conventional methods such as detailed reading and note-taking lack scalability and are susceptible to human errors, and Optical Character Recognition (OCR) and similar techniques encounter difficulties in accurately processing unstructured transcript text, often missing subtle linguistic nuances that drive investor decisions. Here, we demonstrate the utilization of Large Language Models (LLMs) to efficiently and rapidly extract information from earnings report transcripts while ensuring high accuracy transforming the extraction process as well as reducing hallucination by combining retrieval-augmented generation technique as well as metadata. We evaluate the outcomes of various LLMs with and without using our proposed approach based on various objective metrics for evaluating Q\&A systems, and empirically demonstrate superiority of our method.

Date: 2023-10
New Economics Papers: this item is included in nep-ain, nep-big and nep-cmp
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
http://arxiv.org/pdf/2310.10760 Latest version (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2310.10760

Access Statistics for this paper

More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().