Physics-Aware Reinforcement Learning for Flexibility Management in PV-Based Multi-Energy Microgrids Under Integrated Operational Constraints
Shimeng Dong,
Weifeng Yao,
Zenghui Li,
Haiji Zhao,
Yan Zhang and
Zhongfu Tan ()
Additional contact information
Shimeng Dong: State Grid Dispatching & Control Center (SGCC), Beijing 100031, China
Weifeng Yao: State Grid Dispatching & Control Center (SGCC), Beijing 100031, China
Zenghui Li: School of Economics and Management, North China Electric Power University, Beijing 102206, China
Haiji Zhao: State Grid Corporation of China, Northeast Branch, Beijing 100031, China
Yan Zhang: School of Economics and Management, North China Electric Power University, Beijing 102206, China
Zhongfu Tan: School of Economics and Management, North China Electric Power University, Beijing 102206, China
Energies, 2025, vol. 18, issue 20, 1-36
Abstract:
The growing penetration of photovoltaic (PV) generation in multi-energy microgrids has amplified the challenges of maintaining real-time operational efficiency, reliability, and safety under conditions of renewable variability and forecast uncertainty. Conventional rule-based or optimization-based strategies often suffer from limited adaptability, while purely data-driven reinforcement learning approaches risk violating physical feasibility constraints, leading to unsafe or economically inefficient operation. To address this challenge, this paper develops a Physics-Informed Reinforcement Learning (PIRL) framework that embeds first-order physical models and a structured feasibility projection mechanism directly into the training process of a Soft Actor–Critic (SAC) algorithm. Unlike traditional deep reinforcement learning, which explores the state–action space without physical safeguards, PIRL restricts learning trajectories to a physically admissible manifold, thereby preventing battery over-discharge, thermal discomfort, and infeasible hydrogen operation. Furthermore, differentiable penalty functions are employed to capture equipment degradation, user comfort, and cross-domain coupling, ensuring that the learned policy remains interpretable, safe, and aligned with engineering practice. The proposed approach is validated on a modified IEEE 33-bus distribution system coupled with 14 thermal zones and hydrogen facilities, representing a realistic and complex multi-energy microgrid environment. Simulation results demonstrate that PIRL reduces constraint violations by 75–90% and lowers operating costs by 25–30% compared with rule-based and DRL baselines while also achieving faster convergence and higher sample efficiency. Importantly, the trained policy generalizes effectively to out-of-distribution weather conditions without requiring retraining, highlighting the value of incorporating physical inductive biases for resilient control. Overall, this work establishes a transparent and reproducible reinforcement learning paradigm that bridges the gap between physical feasibility and data-driven adaptability, providing a scalable solution for safe, efficient, and cost-effective operation of renewable-rich multi-energy microgrids.
Keywords: physics-informed reinforcement learning; multi-energy microgrids; photovoltaic integration; flexibility management; energy storage coordination; hydrogen systems; safe and resilient control (search for similar items in EconPapers)
JEL-codes: Q Q0 Q4 Q40 Q41 Q42 Q43 Q47 Q48 Q49 (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/1996-1073/18/20/5465/pdf (application/pdf)
https://www.mdpi.com/1996-1073/18/20/5465/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jeners:v:18:y:2025:i:20:p:5465-:d:1773233
Access Statistics for this article
Energies is currently edited by Ms. Cassie Shen
More articles in Energies from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().