Stability Analysis of Batch Offline Action-Dependent Heuristic Dynamic Programming Using Deep Neural Networks

Lala, Timotei

Stability Analysis of Batch Offline Action-Dependent Heuristic Dynamic Programming Using Deep Neural Networks

Timotei Lala ()
Additional contact information
Timotei Lala: Department of Automation and Applied Informatics, Politehnica University of Timisoara, 2, Bd. V. Parvan, 300223 Timisoara, Romania

Mathematics, 2025, vol. 13, issue 2, 1-28

Abstract: In this paper, the theoretical stability of batch offline action-dependent heuristic dynamic programming (BOADHDP) is analyzed for deep neural network (NN) approximators for both the action value function and controller which are iteratively improved using collected experiences from the environment. Our findings extend previous research on the stability of online adaptive ADHDP learning with single-hidden-layer NNs by addressing the case of deep neural networks with an arbitrary number of hidden layers, updated offline using batched gradient descend updates. Specifically, our work shows that the learning process of the action value function and controller under BOADHDP is uniformly ultimately bounded (UUB), contingent on certain conditions related to NN learning rates. The developed theory demonstrates an inverse relationship between the number of hidden layers and the learning rate magnitude. We present a practical implementation involving a twin rotor aerodynamical system to emphasize the impact difference between the usage of single-hidden-layer and multiple-hidden-layer NN architectures in BOADHDP learning settings. The validation case study shows that BOADHDP with multiple hidden layer NN architecture implementation obtains 0.0034 on the control benchmark, while the single-hidden-layer NN architectures obtain 0.0049 , outperforming the former by 1.58% by using the same collected dataset and learning conditions. Also, BOADHDP is compared with online adaptive ADHDP, proving the superiority of the former over the latter, both in terms of controller performance and data efficiency.

Keywords: ADP; ADHDP; deep neural networks; batch learning; Lyapunov stability; uniformly ultimately bounded; gradient descent; Q-function; action value function (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/2/206/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/2/206/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:2:p:206-:d:1563666

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().