EconPapers    
Economics at your fingertips  
 

A Note on Generalized Second-Order Value Iteration in Markov Decision Processes

Villavarayan Antony Vijesh (), Shreyas Sumithra Rudresha () and Mohammed Shahid Abdulla ()
Additional contact information
Villavarayan Antony Vijesh: Indian Institute of Technology Indore
Shreyas Sumithra Rudresha: Indian Institute of Technology Indore
Mohammed Shahid Abdulla: Indian Institute of Management, Kozhikode

Journal of Optimization Theory and Applications, 2023, vol. 199, issue 3, No 7, 1022-1049

Abstract: Abstract Value iteration is one of the first-order algorithms to approximate the solution of the Bellman equation arising from the Markov Decision Process (MDP). In recent literature, by approximating the max operator in the Bellman equation by a smooth function, an interesting second-order iterative method was discussed to solve the new Bellman equation. During the numerical simulation, it was observed that this second-order method is computationally expensive for a reasonable size of state and action. This second-order iterative method also faces difficulty in numerical implementation due to the calculation of an exponential function for larger values. In this manuscript, a few first-order iterative schemes have been derived from the second-order method to overcome the above practical problems. All the proposed iterative schemes possess the global convergence property. The proposed iterative schemes take less time to converge to the solution of the Bellman equation than the second-order method in many cases. These algorithms are efficient and easy to implement. An interesting theoretical comparison is provided between the algorithms. Numerical simulation supports our theoretical results.

Keywords: Markov decision processes; Q-learning; Reinforcement learning; Value iteration (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s10957-023-02309-x Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:joptap:v:199:y:2023:i:3:d:10.1007_s10957-023-02309-x

Ordering information: This journal article can be ordered from
http://www.springer. ... cs/journal/10957/PS2

DOI: 10.1007/s10957-023-02309-x

Access Statistics for this article

Journal of Optimization Theory and Applications is currently edited by Franco Giannessi and David G. Hull

More articles in Journal of Optimization Theory and Applications from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-04-19
Handle: RePEc:spr:joptap:v:199:y:2023:i:3:d:10.1007_s10957-023-02309-x