Q-Learning for Online PID Controller Tuning in Continuous Dynamic Systems: An Interpretable Framework for Exploring Multi-Agent Systems
Davor Ibarra-Pérez (),
Sergio García-Nieto and
Javier Sanchis Saez
Additional contact information
Davor Ibarra-Pérez: Instituto Universitario de Automática e Informática Industrial, Universitat Politècnica de València, 46022 Valencia, Spain
Sergio García-Nieto: Instituto Universitario de Automática e Informática Industrial, Universitat Politècnica de València, 46022 Valencia, Spain
Javier Sanchis Saez: Instituto Universitario de Automática e Informática Industrial, Universitat Politècnica de València, 46022 Valencia, Spain
Mathematics, 2025, vol. 13, issue 21, 1-29
Abstract:
This study proposes a discrete multi-agent Q-learning framework for the online tuning of PID controllers in continuous dynamic systems with limited observability. The approach treats the adjustment of each PID gain ( k p , k i , k d ) as an independent learning process, in which each agent operates within a discrete state space corresponding to its own gain and selects actions from a tripartite space (decrease, maintain, or increase its gain). The agents act simultaneously under fixed decision intervals, favoring their convergence by preserving quasi-stationary conditions of the perceived environment, while a shared cumulative global reward, composed of system parameters, time and control action penalties, and stability incentives, guides coordinated exploration toward control objectives. Implemented in Python, the framework was validated in two nonlinear control problems: a water-tank and inverted pendulum (cart-pole) systems. The agents achieved their initial convergence after approximately 300 and 500 episodes, respectively, with overall success rates of 49.6 % and 46.2 % in 5000 training episodes. The learning process exhibited sustained convergence toward effective PID configurations capable of stabilizing both systems without explicit dynamic models. These findings confirm the feasibility of the proposed low-complexity discrete reinforcement learning approach for online adaptive PID tuning, achieving interpretable and reproducible control policies and providing a new basis for future hybrid schemes that unite classical control theory and reinforcement learning agents.
Keywords: Q-learning; multi-agent; PID; online; interpretable control (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/13/21/3461/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/21/3461/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:21:p:3461-:d:1783176
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().