Energy Regulation-Aware Layered Control Architecture for Building Energy Systems Using Constraint-Aware Deep Reinforcement Learning and Virtual Energy Storage Modeling

Li, Siwei; Tian, Congxiang; Abdalla, Ahmed N.

Energy Regulation-Aware Layered Control Architecture for Building Energy Systems Using Constraint-Aware Deep Reinforcement Learning and Virtual Energy Storage Modeling

Siwei Li, Congxiang Tian and Ahmed N. Abdalla ()
Additional contact information
Siwei Li: Yangtze University College of Arts and Sciences, Jingzhou 434025, China
Congxiang Tian: Yangtze University College of Arts and Sciences, Jingzhou 434025, China
Ahmed N. Abdalla: Faculty of Electronic Information Engineering, Huaiyin Institute of Technology, Huaian 223003, China

Energies, 2025, vol. 18, issue 17, 1-23

Abstract: In modern intelligent buildings, the control of Building Energy Systems (BES) faces increasing complexity in balancing energy costs, thermal comfort, and operational flexibility. Traditional centralized or flat deep reinforcement learning (DRL) methods often fail to effectively handle the multi-timescale dynamics, large state–action spaces, and strict constraint satisfaction required for real-world energy systems. To address these challenges, this paper proposes an energy policy-aware layered control architecture that combines Virtual Energy Storage System (VESS) modeling with a novel Dynamic Constraint-Aware Policy Optimization (DCPO) algorithm. The VESS is modeled based on the thermal inertia of building envelope components, quantifying flexibility in terms of virtual power, capacity, and state of charge, thus enabling BES to behave as if it had embedded, non-physical energy storage. Building on this, the BES control problem is structured using a hierarchical Markov Decision Process, in which the upper level handles strategic decisions (e.g., VESS dispatch, HVAC modes), while the lower level manages real-time control (e.g., temperature adjustments, load balancing). The proposed DCPO algorithm extends actor–critic learning by incorporating dynamic policy constraints, entropy regularization, and adaptive clipping to ensure feasible and efficient policy learning under both operational and comfort-related constraints. Simulation experiments demonstrate that the proposed approach outperforms established algorithms like Deep Q-Networks (DQN), Deep Deterministic Policy Gradient (DDPG), and Twin Delayed DDPG (TD3). Specifically, it achieves a 32.6% reduction in operational costs and over a 51% decrease in thermal comfort violations compared to DQN, while ensuring millisecond-level policy generation suitable for real-time BES deployment.

Keywords: building energy system; reinforcement learning; Markov decision process; virtual energy storage; policy regulation; deep reinforcement learning (search for similar items in EconPapers)
JEL-codes: Q Q0 Q4 Q40 Q41 Q42 Q43 Q47 Q48 Q49 (search for similar items in EconPapers)
Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/1996-1073/18/17/4698/pdf (application/pdf)
https://www.mdpi.com/1996-1073/18/17/4698/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jeners:v:18:y:2025:i:17:p:4698-:d:1741932

Access Statistics for this article

Energies is currently edited by Ms. Cassie Shen

More articles in Energies from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().