Q-Learning for Online PID Controller Tuning in Continuous Dynamic Systems: An Interpretable Framework for Exploring Multi-Agent Systems

Ibarra-Pérez, Davor; García-Nieto, Sergio; Saez, Javier Sanchis

Q-Learning for Online PID Controller Tuning in Continuous Dynamic Systems: An Interpretable Framework for Exploring Multi-Agent Systems

Davor Ibarra-Pérez (), Sergio García-Nieto and Javier Sanchis Saez
Additional contact information
Davor Ibarra-Pérez: Instituto Universitario de Automática e Informática Industrial, Universitat Politècnica de València, 46022 Valencia, Spain
Sergio García-Nieto: Instituto Universitario de Automática e Informática Industrial, Universitat Politècnica de València, 46022 Valencia, Spain
Javier Sanchis Saez: Instituto Universitario de Automática e Informática Industrial, Universitat Politècnica de València, 46022 Valencia, Spain

Mathematics, 2025, vol. 13, issue 21, 1-29

Abstract: This study proposes a discrete multi-agent Q-learning framework for the online tuning of PID controllers in continuous dynamic systems with limited observability. The approach treats the adjustment of each PID gain ( k p , k i , k d ) as an independent learning process, in which each agent operates within a discrete state space corresponding to its own gain and selects actions from a tripartite space (decrease, maintain, or increase its gain). The agents act simultaneously under fixed decision intervals, favoring their convergence by preserving quasi-stationary conditions of the perceived environment, while a shared cumulative global reward, composed of system parameters, time and control action penalties, and stability incentives, guides coordinated exploration toward control objectives. Implemented in Python, the framework was validated in two nonlinear control problems: a water-tank and inverted pendulum (cart-pole) systems. The agents achieved their initial convergence after approximately 300 and 500 episodes, respectively, with overall success rates of 49.6 % and 46.2 % in 5000 training episodes. The learning process exhibited sustained convergence toward effective PID configurations capable of stabilizing both systems without explicit dynamic models. These findings confirm the feasibility of the proposed low-complexity discrete reinforcement learning approach for online adaptive PID tuning, achieving interpretable and reproducible control policies and providing a new basis for future hybrid schemes that unite classical control theory and reinforcement learning agents.

Keywords: Q-learning; multi-agent; PID; online; interpretable control (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/21/3461/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/21/3461/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:21:p:3461-:d:1783176

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().