Q-Sorting: An Algorithm for Reinforcement Learning Problems with Multiple Cumulative Constraints

Huang, Jianfeng; Lu, Guoqiang; Li, Yi; Wu, Jiajun

Q-Sorting: An Algorithm for Reinforcement Learning Problems with Multiple Cumulative Constraints

Jianfeng Huang, Guoqiang Lu, Yi Li and Jiajun Wu ()
Additional contact information
Jianfeng Huang: College of Engineering, Shantou University, Shantou 515063, China
Guoqiang Lu: College of Engineering, Shantou University, Shantou 515063, China
Yi Li: College of Engineering, Shantou University, Shantou 515063, China
Jiajun Wu: College of Engineering, Shantou University, Shantou 515063, China

Mathematics, 2024, vol. 12, issue 13, 1-20

Abstract: This paper proposes a method and an algorithm called Q-sorting for reinforcement learning (RL) problems with multiple cumulative constraints. The primary contribution is a mechanism for dynamically determining the focus of optimization among multiple cumulative constraints and the objective. Executed actions are picked through a procedure with two steps: first filter out actions potentially breaking the constraints, and second sort the remaining ones according to the Q values of the focus in descending order. The algorithm was originally developed upon the classic tabular value representation and episodic setting of RL, but the idea can be extended and applied to other methods with function approximation and discounted setting. Numerical experiments are carried out on the adapted Gridworld and the motor speed synchronization problem, both with one and two cumulative constraints. Simulation results validate the effectiveness of the proposed Q-sorting in that cumulative constraints are honored both during and after the learning process. The advantages of Q-sorting are further emphasized through comparison with the method of lumped performances (LP), which takes constraints into account through weighting parameters. Q-sorting outperforms LP in both ease of use (unnecessity of trial and error to determine values of the weighting parameters) and performance consistency (6.1920 vs. 54.2635 rad/s for the standard deviation of the cumulative performance index over 10 repeated simulation runs). It has great potential for practical engineering use.

Keywords: reinforcement learning; cumulative constraint; constrained Markov decision process (CMDP) (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2024
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/12/13/2001/pdf (application/pdf)
https://www.mdpi.com/2227-7390/12/13/2001/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:12:y:2024:i:13:p:2001-:d:1424494

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().