Model-Data Hybrid-Driven Real-Time Optimal Power Flow: A Physics-Informed Reinforcement Learning Approach

Zhang, Ximing; Ma, Xiyuan; Yu, Yun; Yang, Duotong; Lin, Zhida; Zhou, Changcheng; Xu, Huan; Li, Zhuohuan

Model-Data Hybrid-Driven Real-Time Optimal Power Flow: A Physics-Informed Reinforcement Learning Approach

Ximing Zhang (), Xiyuan Ma, Yun Yu, Duotong Yang, Zhida Lin, Changcheng Zhou, Huan Xu and Zhuohuan Li
Additional contact information
Ximing Zhang: China Southern Power Grid Ltd., Guangzhou 510663, China
Xiyuan Ma: Digital Grid Research Institute, China Southern Power Grid, Guangzhou 510663, China
Yun Yu: China Southern Power Grid Ltd., Guangzhou 510663, China
Duotong Yang: Digital Grid Research Institute, China Southern Power Grid, Guangzhou 510663, China
Zhida Lin: China Southern Power Grid Ltd., Guangzhou 510663, China
Changcheng Zhou: Digital Grid Research Institute, China Southern Power Grid, Guangzhou 510663, China
Huan Xu: China Southern Power Grid Ltd., Guangzhou 510663, China
Zhuohuan Li: Digital Grid Research Institute, China Southern Power Grid, Guangzhou 510663, China

Energies, 2025, vol. 18, issue 13, 1-20

Abstract: With the rapid development of artificial intelligence technology, DRL has shown great potential in solving complex real-time optimal power flow problems of modern power systems. Nevertheless, traditional DRL methodologies confront dual bottlenecks: (a) suboptimal coordination between exploratory behavior policies and experience-based data exploitation in practical applications, compounded by (b) users’ distrust from the opacity of model decision mechanics. To address these, a model–data hybrid-driven physics-informed reinforcement learning (PIRL) algorithm is proposed in this paper. Specifically, the proposed methodology uses the proximal policy optimization (PPO) algorithm as the agent’s foundational framework and constructs a PI-actor network embedded with prior model knowledge derived from power flow sensitivity into the agent’s actor network via the PINN method, which achieves dual optimization objectives: (a) enhanced environmental perceptibility to improve experience utilization efficiency via gradient-awareness from model knowledge during actor network updates, and (b) improved user trustworthiness through mathematically constrained action gradient information derived from explicit model knowledge, ensuring actor updates adhere to safety boundaries. The simulation and validation results show that the PIRL algorithm outperforms the baseline PPO algorithm in terms of training stability, exploration efficiency, economy, and security.

Keywords: deep reinforcement learning (DRL); real-time optimal power flow (RT-OPF); physics-informed neural network (PINN); physics-informed reinforcement learning (PIRL) (search for similar items in EconPapers)
JEL-codes: Q Q0 Q4 Q40 Q41 Q42 Q43 Q47 Q48 Q49 (search for similar items in EconPapers)
Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/1996-1073/18/13/3483/pdf (application/pdf)
https://www.mdpi.com/1996-1073/18/13/3483/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jeners:v:18:y:2025:i:13:p:3483-:d:1692729

Access Statistics for this article

Energies is currently edited by Ms. Agatha Cao

More articles in Energies from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().