Constructing a Portfolio Optimization Benchmark Framework for Evaluating Large Language Models

Cho, Hanyong; Kim, Jang Ho

Constructing a Portfolio Optimization Benchmark Framework for Evaluating Large Language Models

Hanyong Cho and Jang Ho Kim

Abstract: This study introduces a benchmark framework for evaluating the financial decision-making capabilities of large language models (LLMs) through portfolio optimization problems with mathematically explicit solutions. Unlike existing financial benchmarks that emphasize language-processing tasks, the proposed framework directly tests optimization-based reasoning in investment contexts. A large set of multiple-choice questions is generated by varying objectives, candidate assets, and investment constraints, with each problem designed to include a unique correct solution and systematically constructed alternatives. Experimental results comparing GPT-4, Gemini 1.5 Pro, and Llama 3.1-70B reveal distinct performance patterns: GPT achieves the highest accuracy in risk-based objectives and remains stable under constraints, Gemini performs well in return-based tasks but struggles under other conditions, and Llama records the lowest overall performance. These findings highlight both the potential and current limitations of LLMs in applying quantitative reasoning to finance, while providing a scalable foundation for developing LLM-based services in portfolio management.

Date: 2026-03
New Economics Papers: this item is included in nep-ain and nep-cmp
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://arxiv.org/pdf/2603.09301 Latest version (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2603.09301

Access Statistics for this paper

More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().