A deep reinforcement learning control framework for a partially observable system: experimental validation on a rotary flexible link system

Joshi Kumar, V.; Elumalai, Vinodh Kumar

A deep reinforcement learning control framework for a partially observable system: experimental validation on a rotary flexible link system

V. Joshi Kumar and Vinodh Kumar Elumalai

International Journal of Systems Science, 2025, vol. 56, issue 14, 3332-3356

Abstract: This paper puts forward a novel deep reinforcement learning control framework to realise continuous action control for a partially observable system. One of the central problems in continuous action control is finding an optimal policy, which can make the agent achieve the control goals without violating the constraints. Although the reinforcement learning technique (RL) is primarily applied for addressing the optimisation problem in continuous action space, the critical limitation of the existing methods is that they utilise only a one-step state transition approach and fail to capitalise on the information available in the sequence of its previous states. Consequently, learning an optimal policy for continuous action space through current techniques may not be effective. Hence, this study attempts to solve the optimisation problem by integrating a convolutional neural network in a deep reinforcement learning (DRL) framework and realise an optimal policy through an inverse n-step temporal difference learning method. Moreover, we formulate a novel convolutional deep deterministic policy gradient (CDDPG) algorithm and present the convergence analysis through the Bellman contraction operator. One of the key benefits of the proposed approach is that it improves the performance of the RL agent by not only utilising information from a one-step transition but also extracting the hidden information from previous state sequences. The efficacy of the proposed scheme is experimentally validated on a rotary flexible link (RFL) system for tracking control and vibration suppression problems. The experimental validation of the proposed scheme on an RFL system highlights that the CDDPG can offer better tracking and vibration suppression features compared to those of the conventional DDPG and the state-of-the-art proximal policy optimisation (PPO) techniques.

Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
http://hdl.handle.net/10.1080/00207721.2025.2468870 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:taf:tsysxx:v:56:y:2025:i:14:p:3332-3356

Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/TSYS20

DOI: 10.1080/00207721.2025.2468870

Access Statistics for this article

International Journal of Systems Science is currently edited by Visakan Kadirkamanathan

More articles in International Journal of Systems Science from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().