EconPapers    
Economics at your fingertips  
 

A Note on Generalized Second Order Value Iteration in Markov Decision Processes

Anthony Vijesh V, Shreyas S R and Mohammed Shahid Abdulla ()
Additional contact information
Anthony Vijesh V: Indian Institute of Technology Indore
Shreyas S R: Indian Institute of Management Kozhikode
Mohammed Shahid Abdulla: Indian Institute of Management Kozhikode

No 556, Working papers from Indian Institute of Management Kozhikode

Abstract: Value iteration is one of the first-order algorithms to approximate the solution of the Bellman equation arising from the Markov Decision Process (MDP). In recent literature, by approximating the max operator in the Bellman equation by a smooth function, an interesting second-order iterative method was discussed to solve the new Bellman equation. During the numerical simulation, it was observed that this second-order method is computationally expensive for a reasonable size of state and action. This second-order iterative method also faces difficulty in numerical implementation due to the calculation of an exponential function for larger values. In this manuscript, a few first-order iterative schemes have been derived from the second-order method to overcome the above practical problems. All the proposed iterative schemes possess the global convergence property. The proposed iterative schemes take less time to converge to the solution of the Bellman equation than the secondorder method in many cases. These algorithms are efficient and easy to implement. An interesting theoretical comparison is provided between the algorithms. Numerical simulation supports our theoretical results.

Keywords: Markov decision processes; Q-learning; reinforcement learning; value iteration (search for similar items in EconPapers)
Pages: 03 pages
Date: 2023-03
References: Add references at CitEc
Citations:

Downloads: (external link)
https://iimk.ac.in/uploads/publications/IIMKWPS565ITS202304.pdf (application/pdf)
Our link check indicates that this URL is bad, the error code is: 403 Forbidden

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:iik:wpaper:556

Access Statistics for this paper

More papers in Working papers from Indian Institute of Management Kozhikode IIMK Campus PO, Kunnamanagalam, Kozhikode, Kerala, India -673570. Contact information at EDIRC.
Bibliographic data for series maintained by Sudheesh Kumar ().

 
Page updated 2025-04-16
Handle: RePEc:iik:wpaper:556