Improving the learning process of deep reinforcement learning agents operating in collective heating environments

Jacobs, Stef; Ghane, Sara; Houben, Pieter Jan; Kabbara, Zakarya; Huybrechts, Thomas; Hellinckx, Peter; Verhaert, Ivan

Improving the learning process of deep reinforcement learning agents operating in collective heating environments

Stef Jacobs, Sara Ghane, Pieter Jan Houben, Zakarya Kabbara, Thomas Huybrechts, Peter Hellinckx and Ivan Verhaert

Applied Energy, 2025, vol. 384, issue C, No S0306261925001503

Abstract: Deep reinforcement learning (DRL) can be used to optimise the performance of Collective Heating Systems (CHS) by reducing operational costs while ensuring thermal comfort. However, heating systems often exhibit slow responsiveness to control inputs due to thermal inertia, which delays the effects of actions such as adapting temperature set points. This delayed feedback complicates the learning process for DRL agents, as it becomes more difficult to associate specific control actions with their outcomes. To address this challenge, this study evaluates four hyperparameter schemes during training. The focus lies on schemes with varying learning rate (the rate at which weights in neural networks are adapted) and/or discount factor (the importance the DRL agent attaches to future rewards). In this respect, we introduce the GALER approach, which combines the progressive increase of the discount factor with the reduction of the learning rate throughout the training process. The effectiveness of the four learning schemes is evaluated using the actor-critic Proximal Policy Optimization (PPO) algorithm for three types of CHS with a multi-objective reward function balancing thermal comfort and energy use or operational costs. The results demonstrate that energy-based reward functions allow for limited optimisation possibilities, while the GALER scheme yields the highest potential for price-based optimisation across all considered concepts. It achieved a 3%–15% performance improvement over other successful training schemes. DRL agents trained with GALER schemes strategically anticipate on high-price times by lowering the supply temperature and vice versa. This research highlights the advantage of varying both learning rates and discount factors when training DRL agents to operate in complex multi-objective environments with slow responsiveness.

Keywords: Reinforcement learning; Thermal inertia; Control strategy; Discount factor; Learning rate schedule; Collective heating; PPO (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0306261925001503
Full text for ScienceDirect subscribers only

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:eee:appene:v:384:y:2025:i:c:s0306261925001503

Ordering information: This journal article can be ordered from
http://www.elsevier.com/wps/find/journaldescription.cws_home/405891/bibliographic
http://www.elsevier. ... 405891/bibliographic

DOI: 10.1016/j.apenergy.2025.125420

Access Statistics for this article

Applied Energy is currently edited by J. Yan

More articles in Applied Energy from Elsevier
Bibliographic data for series maintained by Catherine Liu ().