Benchmarking OpenAI's APIs and other Large Language Models for repeatable and efficient question answering across multiple documents

Filipovska, Elena; Mladenovska, Ana; Bajrami, Merxhan; Dobreva, Jovana; Hillman, Velislava; Lameski, Petre; Zdravevski, Eftim

Benchmarking OpenAI's APIs and other Large Language Models for repeatable and efficient question answering across multiple documents

Elena Filipovska, Ana Mladenovska, Merxhan Bajrami, Jovana Dobreva, Velislava Hillman, Petre Lameski and Eftim Zdravevski

LSE Research Online Documents on Economics from London School of Economics and Political Science, LSE Library

Abstract: The rapid growth of document volumes and complexity in various domains necessitates advanced automated methods to enhance the efficiency and accuracy of information extraction and analysis. This paper aims to evaluate the efficiency and repeatability of OpenAI's APIs and other Large Language Models (LLMs) in automating question-answering tasks across multiple documents, specifically focusing on analyzing Data Privacy Policy (DPP) documents of selected EdTech providers. We test how well these models perform on large-scale text processing tasks using the OpenAI's LLM models (GPT 3.5 Turbo, GPT 4, GPT 4o) and APIs in several frameworks: direct API calls (i.e., one-shot learning), LangChain, and Retrieval Augmented Generation (RAG) systems. We also evaluate a local deployment of quantized versions (with FAISS) of LLM models (Llama-2-13B-chat-GPTQ). Through systematic evaluation against predefined use cases and a range of metrics, including response format, execution time, and cost, our study aims to provide insights into the optimal practices for document analysis. Our findings demonstrate that using OpenAI's LLMs via API calls is a workable workaround for accelerating document analysis when using a local GPU-powered infrastructure is not a viable solution, particularly for long texts. On the other hand, the local deployment is quite valuable for maintaining the data within the private infrastructure. Our findings show that the quantized models retain substantial relevance even with fewer parameters than ChatGPT and do not impose processing restrictions on the number of tokens. This study offers insights on maximizing the use of LLMs for better efficiency and data governance in addition to confirming their usefulness in improving document analysis procedures.

Keywords: few-shot learning Q&A; GPT; LangChain; Large Language Models; Llama; LLM; multi-document; one-shot learning; OpenAI; QA; RAG (search for similar items in EconPapers)
JEL-codes: J50 (search for similar items in EconPapers)
Pages: 11 pages
Date: 2024-12-31
New Economics Papers: this item is included in nep-big and nep-cmp
References: View complete reference list from CitEc
Citations:

Published in Annals of Computer Science and Intelligence Systems, 31, December, 2024(2024), pp. 107-117. ISSN: 2300-5963

Downloads: (external link)
https://researchonline.lse.ac.uk/id/eprint/126674/ Open access version. (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:ehl:lserod:126674

Access Statistics for this paper

More papers in LSE Research Online Documents on Economics from London School of Economics and Political Science, LSE Library LSE Library Portugal Street London, WC2A 2HD, U.K.. Contact information at EDIRC.
Bibliographic data for series maintained by LSERO Manager ().