Deep Reinforcement Learning for Intraday Multireservoir Hydropower Management

Castro-Freibott, Rodrigo; García-Sánchez, Álvaro; Espiga-Fernández, Francisco; de la Cruz, Guillermo González-Santander

Deep Reinforcement Learning for Intraday Multireservoir Hydropower Management

Rodrigo Castro-Freibott (), Álvaro García-Sánchez (), Francisco Espiga-Fernández and Guillermo González-Santander de la Cruz
Additional contact information
Rodrigo Castro-Freibott: baobab soluciones, José Abascal 55, 28003 Madrid, Spain
Álvaro García-Sánchez: Industrial Engineering, Business Administration and Statistics Department, Escuela Técnica Superior de Ingenieros Industriales, Universidad Politécnica de Madrid, José Gutierrez Abascal 2, 28006 Madrid, Spain
Francisco Espiga-Fernández: Industrial Engineering, Business Administration and Statistics Department, Escuela Técnica Superior de Ingenieros Industriales, Universidad Politécnica de Madrid, José Gutierrez Abascal 2, 28006 Madrid, Spain
Guillermo González-Santander de la Cruz: baobab soluciones, José Abascal 55, 28003 Madrid, Spain

Mathematics, 2025, vol. 13, issue 1, 1-18

Abstract: This study investigates the application of Reinforcement Learning (RL) to optimize intraday operations of hydropower reservoirs. Unlike previous approaches that focus on long-term planning with coarse temporal resolutions and discretized state-action spaces, we propose an RL framework tailored to the Hydropower Reservoirs Intraday Economic Optimization problem. This framework manages continuous state-action spaces while accounting for fine-grained temporal dynamics, including dam-to-turbine delays, gate movement constraints, and power group operations. Our methodology evaluates three distinct action space formulations (continuous, discrete, and adjustments) implemented using modern RL algorithms (A2C, PPO, and SAC). We compare them against both a greedy baseline and Mixed-Integer Linear Programming (MILP) solutions. Experiments on real-world data from a two-reservoir system and a simulated six-reservoir system demonstrate that while MILP achieves superior performance in the smaller system, its performance degrades significantly when scaled to six reservoirs. In contrast, RL agents, particularly those using discrete action spaces and trained with PPO, maintain consistent performance across both configurations, achieving considerable improvements with less than one second of execution time. These results suggest that RL offers a scalable alternative to traditional optimization methods for hydropower operations, particularly in scenarios requiring real-time decision making or involving larger systems.

Keywords: daily optimization; hydropower generation; multireservoir; reinforcement learning; mixed integer linear programming (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/1/151/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/1/151/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:1:p:151-:d:1559633

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().