ELM-Bench: A Multidimensional Methodological Framework for Large Language Model Evaluation in Electricity Markets
Hang Fan (),
Shijie Ji,
Peng Yuan,
Qingsong Zhao,
Shuaikang Wang,
Xiaowei Tan and
Yunjie Duan
Additional contact information
Hang Fan: School of Economics and Management, North China Electric Power University, Beijing 100000, China
Shijie Ji: Beijing Power Exchange Center Co., Ltd., Beijing 100000, China
Peng Yuan: State Grid LiaoNing Electric Power Supply Co., Ltd., Electric Power Research Institute, Shenyang 110000, China
Qingsong Zhao: State Grid LiaoNing Electric Power Supply Co., Ltd., Electric Power Research Institute, Shenyang 110000, China
Shuaikang Wang: School of Economics and Management, North China Electric Power University, Beijing 100000, China
Xiaowei Tan: School of Economics and Management, North China Electric Power University, Beijing 100000, China
Yunjie Duan: School of Economics and Management, North China Electric Power University, Beijing 100000, China
Energies, 2025, vol. 18, issue 15, 1-23
Abstract:
The large language model (LLM) has significant potential for application in the field of electricity markets, but there are shortcomings in professional evaluation methods for LLM: single task, limited dataset coverage, and lack of depth. To this end, this article proposes the ELM-Bench framework for evaluating the LLM of the Chinese electricity market, which evaluates the model from 3 dimensions of understanding, generation, and safety through 7 tasks (such as common-sense Q&A and terminology explanations) with 2841 samples. At the same time, a specialized domain model QwenGOLD was fine-tuned based on the general LLM. The evaluation results show that the top-level general model performs well in general tasks due to high-quality pre-training, while QwenGOLD performs better in tasks such as prediction and decision-making in professional fields, verifying the effectiveness of domain fine-tuning. The study also found that fine-tuning has limited improvement on LLM’s basic abilities, but its score in professional prediction tasks is second only to Deepseek-V3, indicating that some general LLMs can handle domain data well without professional training. This can provide a basis for model selection in different scenarios, balancing performance and training costs.
Keywords: electricity market; large language models; prompt instructions; evaluation framework; model fine-tuning (search for similar items in EconPapers)
JEL-codes: Q Q0 Q4 Q40 Q41 Q42 Q43 Q47 Q48 Q49 (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/1996-1073/18/15/3982/pdf (application/pdf)
https://www.mdpi.com/1996-1073/18/15/3982/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jeners:v:18:y:2025:i:15:p:3982-:d:1710128
Access Statistics for this article
Energies is currently edited by Ms. Agatha Cao
More articles in Energies from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().